Closed Rahel14350 closed 9 years ago
Dear Rahaleh,
In the trimming step, each read is split by default into 50-nt overlapping reads using a 25-nt window. For example, a 100-nt read would produce 3 overlapping reads (positions 1-50, 26-75, 51-100). Also, both read mates from the paired-end sequencing are pooled, if available. Then, if multiple sub-reads map to junctions, only a random one is counted (to avoid double counting). For that reason, reads need to have a special heading, which is assigned during the trimming process. This has some implications: 1) 50-nt reads are also "trimmed" (simply, the head is converted); 2) you can only use the --pretimmed option if the reads have been trimmed by vast-tools. That explains why you may be getting very different results (basically, it would not be properly recognizing the reads).
I apologize since this wasn't properly explained in the read me. I'll add a description now. However, when you have a doubt, please have a look at the Supplementary Info from the references cited in vast-tools, which have extra information.
Best, Manu
Dear Tim, I have a question about trimming step. My RNAseq data are stranded paired with 100 bp length. In the trimming step are the cutted to 50 bp? or how the length option is included here? I did alignment two times, the first time with raw sequences and the second time with trimmed and filtered reads from myself using --pretimmed --useFastq options. The IR output is totally different in this two alignment? Would you please explain me more what is happening in the trimming step. Many thanks in advance, Kind Regards, Raheleh