gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

dictionary optimization #342

Closed yollct closed 2 years ago

yollct commented 2 years ago

Hi, I have been using the preDE.py3 script to obtain the count table. However, for sometime the script has been throwing KeyError when creating dictionary. After I replaced the dicts to defaultdict, it doesn't throw me error again. So I think it would help if someone has the same issue as me.

Best, Chit Tong

gpertea commented 2 years ago

Thank you for your patch. I am only concerned that the proposed patch might be masking a symptom of a deeper issue. That KeyError exception should not be raised unless a transcript is somehow missing in one of the input files, but that should not be the case if all the input files were produced with stringtie -e using the same reference guides -G file.

However issue #337 seems to suggest that the most recent version of stringtie may suffer from a regression bug that made it possible again to have missing transcripts in stringtie -e output even when the protocol was followed correctly. The last time this happened it was only for transcripts that were found to have no detectable expression in the sample (zero effective coverage), and if that is the case again, then indeed your patch could address the issue without a loss of information.