Open luigra opened 6 years ago
What is the alignment track shown there, it's from STAR or HISAT2 ? (since the BAM file seem to be named the same in both cases, ERR878367.sorted.bam). Anyway we cannot provide support for STAR alignments, HISAT2 is the recommended and supported aligner for StringTie (and you did not show the command line for the aligners, which seems to be crucial for the issue you raised here; I wouldn't be able to comment on STAR parameters anyway). It does seem weird that stringtie v1.2.3 was able to find all those many more isoforms based on STAR alignments, but then again I seem to recall that v1.2.3 had some bug which showed 0 FPKM assemblies so perhaps it was also generating a lot of seemingly low expression isoforms.. While v1.3.3 is probably better at filtering out spurious alignments (and thus not showing potentially low-expression, low-probability assemblies). It's hard to answer your question without comparing the accuracy/validity of alignments produced by HISAT2 vs STAR in this case, which is a very important part of the answer here.
The alignment results, at least in this region, look very similar.
Below the commands I used for hisat2: --known-splicesite-infile /scratch/cbrc/ref/ensembl/human/GRCh37.75/blueprint/Homo_sapiens.GRCh37.75.chr_ERCC92.ss --rna-strandness RF --downstream-transcriptome-assembly -p 16
Wanting to go further with this comparison I simulated Fastq reads from the reference transcriptome with RSAT and I ran STAR and hisat2. As downstream analysis I assembled the transcriptomes from both aln w/o the parameters -G and --rf. Then I compared both reconstructed transcriptomes with the source one by using gffcompare. Below the results that look very similar to me
hisat2 | Sensitivity | Precision |
---|---|---|
Base level | 66.2 | 95.6 |
Exon level | 47.3 | 94.0 |
Intron level | 61.7 | 99.3 |
Intron chain level | 15.6 | 66.2 |
Transcript level | 15.8 | 63.4 |
Locus level | 71.3 | 78.3 |
STAR | Sensitivity | Precision |
---|---|---|
Base level | 66.1 | 96.4 |
Exon level | 47.2 | 94.7 |
Intron level | 61.3 | 99.6 |
Intron chain level | 15.6 | 68.2 |
Transcript level | 15.9 | 66.1 |
Locus level | 73.2 | 82.3 |
Dear Geo,
I had different assembled transcriptomes depending by the usage of STAR or HISAT2 with the latest version of stringtie. You can see a screenshot of a region I am interested in.
Is there something that I need to do, other than sorting the bam by coordinates, in order to have a reliable transcriptome by using the alignment results from star?
Thanks in advance Luigi
Below you can find the command I used for it: /home/cbrcmod/scratch/modules/out/modulebin/stringtie/1.3.3/bin/stringtie /scratch/cbrc/analysis/BPepi-22/out/bam/STAR/ERR878367.sorted.bam -o /scratch/cbrc/analysis/BPepi-22/tmp/stringtie/ERR878367.gtf -p 6 --rf -G /scratch/cbrc/ref/ensembl/human/GRCh37.75/blueprint/Homo_sapiens.GRCh37.75.chr.gtf -v -l BPSTRG
/home/cbrcmod/scratch/modules/out/modulebin/stringtie/1.3.3/bin/stringtie /scratch/cbrc/analysis/BPepi-27/out/bam/hisat2/ERR878367.sorted.bam -o /scratch/cbrc/analysis/BPepi-27/tmp/stringtie/hisat2/ERR878367.gtf -p 8 --rf -G /scratch/cbrc/ref/ensembl/human/GRCh37.75/blueprint/Homo_sapiens.GRCh37.75.chr.gtf -v -l STRG
If can help I noticed that stringtie 1.2.3 on the star alignment was able to detect the new 5' exons and the longer 3 utr exon alongside with other transcripts (I included the relative track in the screenshot). This i the commands I used for it: stringtie /opt/data3/Projects/BPepi/BPepi-8/out/bam/STAR/ERR878367.sorted.bam -o /opt/data3/Projects/BPepi/BPepi-8/stringtie/MK/ERR878367/first_transcript.gtf -p 4 -G /opt/data2/mak58/blueprint/annotation/Homo_sapiens.GRCh37.70.gtf -v