tomasbruna / miniprothint

Miniprothint selects a set of reliable gene prediction hints from miniprot alignments scored by miniprot boundary scorer.
Other
7 stars 2 forks source link

[Question] General usage for getting gene predictions from miniprot #2

Open jolespin opened 1 month ago

jolespin commented 1 month ago

I just recently found out about miniprot and have been testing it out as an alternative to MetaEuk in my VEBA metagenomics workflow software suite (https://github.com/jolespin/veba). When I ran miniprot with my genome database I got MANY more genes than expected (~143k genes) even when using the smaller intron size (2k) and minimum coverage (0.25) with a length filter matching that of MetaEuk.

My question is how I can use your tools to get more reliable gene models from eukaryotic metagenome-assembled genomes?

# First run miniprot to get alignments
miniprot genome.fasta proteins.fasta --aln > miniprot.aln
# Then run miniprothint to get hints
miniprothint.py --alignment miniprot.aln --workdir miniprothint

What next?

KatharinaHoff commented 1 month ago

This is ongoing work (how to predict genes ab initio and with evidence e.g. from miniprothint). I will be happy to discuss with you but we are not at the point of having everything figured out. We intermediate results that allow some conclusions.

Josh L. Espinoza @.***> schrieb am Di. 1. Okt. 2024 um 01:40:

I just recently found out about miniprot and have been testing it out as an alternative to MetaEuk in my VEBA metagenomics workflow software suite ( https://github.com/jolespin/veba). When I ran miniprot with my genome database I got MANY more genes than expected (~143k genes) even when using the smaller intron size (2k) and minimum coverage (0.25) with a length filter matching that of MetaEuk.

My question is how I can use your tools to get more reliable gene models from eukaryotic metagenome-assembled genomes?

First run miniprot to get alignments

miniprot genome.fasta proteins.fasta --aln > miniprot.aln

Then run miniprothint to get hints

miniprothint.py --alignment miniprot.aln --workdir miniprothint

What next?

— Reply to this email directly, view it on GitHub https://github.com/tomasbruna/miniprothint/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCIY7ARH4LSXUH6NFTZZHONFAVCNFSM6AAAAABPEMZCAGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TOOBVG43DSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jolespin commented 1 month ago

Ok awesome thank you for the update. Right now, the eukaryotic gene modeling is a huge resource bottneneck in VEBA especially when running it at scale. Trying to work out some methods right now but would prefer not to try and reinvent the wheel. Coming from more of the metagenomics/pipeline/ML background but let me know if I can help in anyway. Appreciate the great research your doing in this largely overlooked space!