vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

question about Trimming step in the vast-tools align #45

Closed Rahel14350 closed 9 years ago

Rahel14350 commented 9 years ago

Dear Tim, I have a question about trimming step. My RNAseq data are stranded paired with 100 bp length. In the trimming step are the cutted to 50 bp? or how the length option is included here? I did alignment two times, the first time with raw sequences and the second time with trimmed and filtered reads from myself using --pretimmed --useFastq options. The IR output is totally different in this two alignment? Would you please explain me more what is happening in the trimming step. Many thanks in advance, Kind Regards, Raheleh

mirimia commented 9 years ago

Dear Rahaleh,

In the trimming step, each read is split by default into 50-nt overlapping reads using a 25-nt window. For example, a 100-nt read would produce 3 overlapping reads (positions 1-50, 26-75, 51-100). Also, both read mates from the paired-end sequencing are pooled, if available. Then, if multiple sub-reads map to junctions, only a random one is counted (to avoid double counting). For that reason, reads need to have a special heading, which is assigned during the trimming process. This has some implications: 1) 50-nt reads are also "trimmed" (simply, the head is converted); 2) you can only use the --pretimmed option if the reads have been trimmed by vast-tools. That explains why you may be getting very different results (basically, it would not be properly recognizing the reads).

I apologize since this wasn't properly explained in the read me. I'll add a description now. However, when you have a doubt, please have a look at the Supplementary Info from the references cited in vast-tools, which have extra information.

Best, Manu