gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Stringtie with old Illumina RNA-seq data #296

Open francicco opened 3 years ago

francicco commented 3 years ago

Hi,

I'm trying to reuse some relatively "old" illumina RNA-seq data not strand-specific. What would be the best way to process them?

Thanks a lot Francesco

bfpedro commented 3 years ago

Hi Francesco! I'm working with "old" data too (produced in 2014), so maybe I can help.

One important thing to note is that the Phred qualities used by Illumina software changed from Phred +64 to Phred +33. For example, Illumina pipelines 1.3 and 1.5 use Phred +64, and since Illumina 1.8, it changed to Phred +33.

You might want to use quality control software, such as FastQC, to check which scores they have. I also found this post suggesting how to tell the difference by looking directly to your data: http://seqanswers.com/forums/showthread.php?t=81978

I don't know what your goal is exactly, but assuming you will align the reads to a genome before running stringtie, you will want to tell the aligner that your data is scored in Phred +64 format. In Hisat2, you only need to provide a "--phred64" flag, for example.

Since your reads are not strand-specific, you don't need to tell Hisat2 that, since its the default.

After this, you can proceed as usual with stringtie.

If you need more information, just ask!

Pedro