gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

How do I run if I have Illumina, Nanopore AND PacBio data? #367

Open niradsp opened 2 years ago

niradsp commented 2 years ago

The mix option takes as input short-read data and long-read data. However, in my case, I have both PacBio and Nanopore data. The mix option does not allow 3 inputs. So here is how I am running it. Please let me know if I should do this differently.

  1. If PacBio and Nanopore data belong to the same sample, I am combining them using samtools merge. I still have 40 samples for long read and 40 for short read.
  2. Next, I am assembling them using reference genome as guide. Here I use the --mix option. This generates a GTF file for each sample.
  3. Next, I am merging the data using stringtie merge, and including the reference annotation with the -G option. What I notice is that alternative first isoforms quite often get collapsed and removed, which is why it made sense to include the reference genome.
  4. Following this, I am using this annotation to then compute expression, using the -e option, and again the --mix option.

After this you can run prepDE, etc for downstream analysis.