gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
369 stars 77 forks source link

StringTie for single-end reads #398

Closed Mirror1211 closed 1 year ago

Mirror1211 commented 1 year ago

Hi,

Recently, I tried to use StringTie to carry out the gene-level quantification for 2000+ transcriptome data, including both pair-end and single-end reads. I used HISAT2 to map the reads onto the referencing genomes and generate the bam files.The commandlines are as follows:

For pair-end reads:

hisat2 --dta -p 20 -x reference_hisat2 -1 sample1_1.fq.gz -2 sample1_2.fq.gz|samtools sort - > sample1_pair_end.sorted.bam​

stringtie -p 20 -e -A sample1_gene_abund.tab -C sample1_gene_abund.gtf -G reference.gtf -o sample1.gtf sample1_pair_end.sorted.bam​

For single-end reads:

hisat2 --dta -p 20 -x reference_hisat2 -U sample2.fq.gz|samtools sort - > sample2_single_end.sorted.bam

​stringtie -p 20 -e -A sample2_gene_abund.tab -C sample2_gene_abund.gtf -G reference.gtf​ -o sample2.gtf sample2_single_end.sorted.bam​​​

Because the tutorials from github and other websits mainly focus on bam files or pair-end data, I have no idea whether the pipelines for single-end data are appropriate. If not, how should I conduct the quantification for single-end reads using HISAT2 + StringTie?

Sincerelly

gpertea commented 1 year ago

Yes, StringTie works the same way on single-end data (no change in the parameters), though of course there may be some usually obvious parameter changes you have to apply to other programs in your analysis pipeline in order to specify single-end reads (like you did above using -U for hisat2).

Paired-end reads are preferred/recommended for multiple reasons - e.g. not only improving the accuracy of the alignments but also providing additional structural hints for StringTie during transcript assembly that improve the quality of the assembled transfrags.