does predict make sense on same fastq files as input?

thomas-keller commented 5 years ago

Hi,

Thanks a lot for this tool, it's really handy and works a lot better than my own bespoke attempts at a smallrna pipeline.

At a basic level, does predict make sense for predicting novel RNAs within the input list, or should it be an exterior/different list of samples? Like, if I'm understanding the section in your BMC paper, you had an outside set of 12 rat samples you were predicting on.

However, in my case I have 150 samples from 3 different treatments and I just want to predict the novel types. Sorry that was a bit longwinded, I was just wondering if the following command was going to be garbage or not, the phenoA.txt files are a list of 50 fastq files for one of 3 treatments. It's running on the cluster now, and hasn't broken yet, so I'm definitely at the advice stage right now rather than bugfixing:

miRge2.0 predict -s exosome_mirge_phenoA.txt -o ../mirge_pred_exosomeA -d miRGeneDB \ -pb /apps/bowtie/1.1.2/bin/ -lib ~/mirgelib/ -sp human \ -pr ~/miniconda3/envs/mirge/bin/ -ps ~/miniconda3/envs/mirge/bin/ -maxl 27 \ -ad AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -cpu 10 -trf -ws exosome_mirge_phenoA.txt

Maybe the setup would be better as input (treatmentAB.txt) and use the -ws for treatmentC? I really don't know the "right"/"regular" way to do it.

mhalushka commented 5 years ago

Thanks for the commentary. I'm not sure I 100% follow, but I'd start with one file and work up. Predict is a lot slower than annotate, so you might want to get a sense of how long the run is before you commit to a big run. I think you can do a batch afterwards. I honestly can't remember the particulars on whether or not it drew any strength going across multiple samples in the same run. For that I'd recommend running two samples separately then the two samples together. Let me know what you find.

thomas-keller commented 5 years ago

OK, I tried to ask way to many questions. Apologies. Thanks for the tip about the running time. Now it's clear from your answer that I can just run it with one sample as you say and get predictions from that same sample.

mhalushka / miRge

does predict make sense on same fastq files as input? #19