nedialkova-lab / mim-tRNAseq

Modification-induced misincorporation tRNA sequencing
GNU General Public License v3.0
19 stars 14 forks source link

Isodecoder column in counts incorrect #35

Closed TomSmithCGAT closed 2 years ago

TomSmithCGAT commented 2 years ago

When the 'parent' is a tRX gene and the isodecoders cannot be separately quantified, the isodecoder name is incorrect.

See below for an example ('Homo_sapiens_tRX-Gly-CCC-2' & 'Homo_sapiens_tRNA-Gly-CCC-8') from 3 separate runs of mimseq on different sets of simulated data. I'm just showing the 1st, 2nd and last columns of the respectve Isodecoder_counts_raw.txt files.

The first two rows are from two separate runs of mimseq and show what happens when Homo_sapiens_tRX-Gly-CCC-2 and Homo_sapiens_tRNA-Gly-CCC-8 cannot be separately quantified. The isodecoder name is erroneously given as Homo_sapiens_tRX-Gly-CCC-2/8, which appears to denote the isodecoders as Homo_sapiens_tRX-Gly-CCC-2 and Homo_sapiens_tRX-Gly-CCC-8, not Homo_sapiens_tRNA-Gly-CCC-8.

The final two lines are from a mimseq run where the two tRNAs are separately quantified and so this issue doesn't arise

$ grep -hn tRX-Gly-CCC-2 */counts/Isodecoder_counts_raw.txt | awk '{print $1, $2, $(NF)}'
365:Homo_sapiens_tRX-Gly-CCC-2/8 False Homo_sapiens_tRX-Gly-CCC-2
364:Homo_sapiens_tRX-Gly-CCC-2/8 False Homo_sapiens_tRX-Gly-CCC-2
176:Homo_sapiens_tRNA-Gly-CCC-8 True Homo_sapiens_tRX-Gly-CCC-2
381:Homo_sapiens_tRX-Gly-CCC-2 True Homo_sapiens_tRX-Gly-CCC-2

When this occurs in the opposite direction (when the parent is _tRNA and the child _tRX), it's handled in https://github.com/nedialkova-lab/mim-tRNAseq/blob/7c1cd62bf3a30ea45425d5cccb6ce274697364c3/mimseq/mmQuant.py#L667 so I guess that's where this needs to be recified too?

drewjbeh commented 2 years ago

Hi Tom,

Thanks for that. Indeed the other way around was a recent fix. I did not think about or implement a fix for when the tRX gene is the parent! Thanks for pointing this out.

drewjbeh commented 2 years ago

Hi Tom,

Finally got around to fixing this! I just pushed some new changes to this repo that should fix the issue. I couldn't find any datasets of ours that could recreate the unsplit Gly-CCC example you mentioned above, so if you could pull the latest changes from here and test it that would be much appreciated. If you find it in order I will release the changes in mimseq v1.1.8 next week. I did find another example of tRX-Gln-CTG-3 which is the parent and unsplit from tRNA-Gln-CTG-10. With the new fix, the cluster is correctly named tRX-Gln-CTG-3/tRNA10, so hopefully it works for you too.

TomSmithCGAT commented 2 years ago

All seems to be fixed now. Thanks!