mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Stranded libraries #31

Closed retrogenomics closed 5 years ago

retrogenomics commented 5 years ago

Hi,

We generally have stranded RNA-seq libraries and I use the --stranded yesoption, but I'm wondering if strand is actually used for TE counts. I have cases where a TE is inserted in the 3'UTR of a transcript, in opposite orientation. This gene is strongly upregulated and the TE too...

I can't be 100% sure that this is the cause, since I don't know from which loci the TE count comes from, but it looks very suspicious.

Also I'm wondering what --strand option should be used for stranded mRNA TruSeq libraries (yes or reverse? This is sometimes a bit ambigous (see here).

Thank you in advance for your help -Gael

olivertam commented 5 years ago

Hi Gael,

For TruSeq libraries, the --strand reverse option should be used. Thanks.

Cheers, Oliver

retrogenomics commented 5 years ago

Thanks Oliver, indeed that was the problem.

xiaocong3333 commented 4 years ago

I'm also confused about, for stranded mRNAseq, the -strand(yes/reverse), how should I use. I use yes and reverse separately, and the result of sigdiff gene and TE is totally different. In which situation, should I use yes or reverse?

Thank you in advance for your help! Cong

olivertam commented 4 years ago

Hi Cong,

Depending on how your library is constructed, the sequenced read (read 1 in paired-end libraries) is either in the same orientation (sense) as the mRNA transcript (--stranded yes), or in the opposite orientation (antisense) as the mRNA transcript (--stranded reverse). If you are using the Illumina TruSeq stranded mRNA kit, they usually generate libraries where the sequenced read is in the antisense orientation. Thus you would choose --stranded reverse

See also HTSeq-count's description, which we based our parameter on:

-s <yes/no/reverse>, --stranded=<yes/no/reverse> For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

A simple check to see what --stranded mode is appropriate is to check which one had a higher count for a broadly expressed gene (e.g. Gapdh, beta-actin). If you quantified way more beta actin reads from the --stranded reverse run, then that's probably the correct mode to use.

Please let me know if you have further questions. Thanks

xiaocong3333 commented 4 years ago

Thank you! Your suggestions are very helpful!