A potential problem when working on self-made tRNA fasta

ruixuan-zhang commented 1 year ago

Hi Behrens,

Thank you for developing such a good tool. I am new to tRNA sequencing and alignment. I learned a lot from your code about how to treat tRNA sequences.

I found that the function initIntronDict may have some problem for self-made tRNA fasta files. This function searches chr when reading tRNAscan out and it can be a problem if someone's file starts with scaffold or NCxxx (sadly like me :( .

https://github.com/nedialkova-lab/mim-tRNAseq/blob/899fad8a9aaec6861ed6b53c80fd8ad28e395d23/mimseq/tRNAtools.py#L941

Maybe filtering "Sequence", "Name" and "-" out is a safe way, such as using if not line.startswith(("Sequence", "Name", "-")):.

And probably it is also problematic in the function intronRemover, because it uses a regular expression to search "tRNAscan-SE ID: (.*?)\).|\((chr.*?)-"

Best regards, Ruixuan

drewjbeh commented 1 year ago

Hi @ruixuan-zhang ,

Thanks for the suggestion. I will implement these changes in the next version of mimseq. I was wondering if you could copy in a few examples of fasta headers for your custom tRNA references so that I can ensure the new code works properly? Thanks!

drewjbeh commented 1 year ago

Oh and also a couple of lines from your ".out" file - thanks!

drewjbeh commented 1 year ago

Hi @ruixuan-zhang,

I have just released a new version of mimseq v1.3.7 that should be available in the next few days. This should solve our issue. Let me know how it goes.

ruixuan-zhang commented 1 year ago

Hi Behren, Sorry for a late reply. Thank you very much for your efforts. I will test it soon! Best.

nedialkova-lab / mim-tRNAseq

A potential problem when working on self-made tRNA fasta #43