weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
139 stars 20 forks source link

Issues with Gymnosperms #118

Closed nhartwic closed 3 months ago

nhartwic commented 3 months ago

My lab recently picked up Helixer and have found that it works extremely well. So far the only thing it has performed poorly on is Gymnosperm genomes. Admittedly, testing has been somewhat limited here, I've ran a publicly available Gnetum genome, a not yet public Ephedra genome, and am in the process of testing on Giant Redwood.

Assembly Busco C Ref Busco C Helixer
Gnetum 90.6% 76.3%
Ephedra 82.8% 48.0%
Redwood TODO TODO

...Any thoughts/advice? What data would be needed to fine tune the land plant model for gymnosperms or train a new model entirely?

nhartwic commented 3 months ago

Closing this since you seem to have reasonable docs for training already. Not sure how I missed them earlier.

alisandra commented 3 months ago

Thanks! Feel free to reach out at any point if that's helpful for fine tuning or training.