gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
385 stars 78 forks source link

Change between 2.2.1 and 2.2.2 on gene abundance estimation with '-e' parameter #434

Open lldelisle opened 5 months ago

lldelisle commented 5 months ago

Dear stringtie developer, I noticed a strong change between version 2.2.1 and 2.2.2 when I run stringtie with the -e option. It seems that all lowly expressed genes went to 0. Here is an example of output with version 2.2.1 on a small dataset: Screenshot from 2024-06-26 09-09-16

Here is the output with the same command line but version 2.2.2: Screenshot from 2024-06-26 09-10-48

If you need a small dataset, I put a small BAM with reads covering a single gene and the associated gtf in usegalaxy.org.

gpertea commented 5 months ago

Thank you for this report and for providing the files to reproduce this issue.

Looking at the two BAM files I see there, it seems only the small_BAM_SR BAM file (data 2) is triggering this issue, but not the small_BAM_PE file (data 7), correct?

[EDIT: fixed the data #s]

gpertea commented 5 months ago

Note to self & @mpertea: it seems this is related to #238 and the last version of the #357 bug, and the new development branch might have the fix for all of them. I'll work on back-porting and testing that possible fix.

lldelisle commented 5 months ago

Thank you for this report and for providing the files to reproduce this issue.

Looking at the two BAM files I see there, it seems only the small_BAM_SR BAM file (data 2) is triggering this issue, but not the small_BAM_PE file (data 7), correct?

[EDIT: fixed the data #s]

On the minimum example yes because I put a single gene where the coverage is below 1 for the small_BAM_SR and above 1 for the small_BAM_PE. So, I suspect this comes from the level of expression. For the PE without subsetting I get (left is 2.2.1, right is 2.2.2): image If you want, I can share with you the full BAM and gtf.