Open jolespin opened 1 month ago
This is ongoing work (how to predict genes ab initio and with evidence e.g. from miniprothint). I will be happy to discuss with you but we are not at the point of having everything figured out. We intermediate results that allow some conclusions.
Josh L. Espinoza @.***> schrieb am Di. 1. Okt. 2024 um 01:40:
I just recently found out about miniprot and have been testing it out as an alternative to MetaEuk in my VEBA metagenomics workflow software suite ( https://github.com/jolespin/veba). When I ran miniprot with my genome database I got MANY more genes than expected (~143k genes) even when using the smaller intron size (2k) and minimum coverage (0.25) with a length filter matching that of MetaEuk.
My question is how I can use your tools to get more reliable gene models from eukaryotic metagenome-assembled genomes?
First run miniprot to get alignments
miniprot genome.fasta proteins.fasta --aln > miniprot.aln
Then run miniprothint to get hints
miniprothint.py --alignment miniprot.aln --workdir miniprothint
What next?
— Reply to this email directly, view it on GitHub https://github.com/tomasbruna/miniprothint/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCIY7ARH4LSXUH6NFTZZHONFAVCNFSM6AAAAABPEMZCAGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TOOBVG43DSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok awesome thank you for the update. Right now, the eukaryotic gene modeling is a huge resource bottneneck in VEBA especially when running it at scale. Trying to work out some methods right now but would prefer not to try and reinvent the wheel. Coming from more of the metagenomics/pipeline/ML background but let me know if I can help in anyway. Appreciate the great research your doing in this largely overlooked space!
I just recently found out about miniprot and have been testing it out as an alternative to MetaEuk in my VEBA metagenomics workflow software suite (https://github.com/jolespin/veba). When I ran miniprot with my genome database I got MANY more genes than expected (~143k genes) even when using the smaller intron size (2k) and minimum coverage (0.25) with a length filter matching that of MetaEuk.
My question is how I can use your tools to get more reliable gene models from eukaryotic metagenome-assembled genomes?
What next?