gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
386 stars 78 forks source link

Merge BAM files from biological replicates or not #343

Closed biozzq closed 2 years ago

biozzq commented 3 years ago

Hi all,

I think this should be an old question. To increase coverage and detection power, would you combine multiple BAM files from biological replicates into one single BAM and run stringtie to identify novel transcripts? Thanks for your attection.

Best regards, Zheng zhuqing

gpertea commented 2 years ago

No, do not run StringTie on multiple samples (replicates) at once (by merging sample BAM files into one in advance or by providing multiple BAM files in the command line). Not only that can create memory usage issues, but multiplexing samples actually breaks some important sampling/statistical assumptions that StringTie makes for the transcript assembly and quantification algorithms.

Also, differential expression workflows generally require data from biological replicates to be kept separate and provided as separate inputs for such statistical analyses. The DE workflow shown in the StringTie manual suggests assembling each sample separately and "meta-assemble" (merge) the results, only to get a better transcriptomic representation across all samples, but then each sample should still be quantitated independently against that "merged" transcriptome.