weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
158 stars 26 forks source link

hypothetical all-eukaryote model #136

Closed nhartwic closed 1 month ago

nhartwic commented 1 month ago

Honestly, this is more an information request than anything. I'm curious if during development, you ever attempted to pool all your curated datasets and train a single all-eukaryote model instead of the four models we currently have.

Speculating here, while I'm not overly hopeful that such a model would perform better than any of the existing four models, when applied to their respective domains, I suspect such a model may generalize more reliably to assemblies that are not well represented in the database. The model probably still won't work great, but may still be useful for some applications.

alisandra commented 1 month ago

We did one small investigation in this direction. It, as you guessed, hurt performance on plants & animals, and we didn't continue further.

I do think the idea, ideally with more, and more diverse training data and a larger model, is nevertheless promising.

nhartwic commented 1 month ago

Any chance you still have a trained model from that investigation kicking around somewhere and are curious how it will perform on weird stuff like brown algae, soft corals, and gymnosperms?

I'd be happy to do some limited testing. Good reference grade gene predictions aren't really available here so we are mostly at the mercy of buscos and purely descriptive statistics for performance metrics. I get that you are probably busy and this all-eukaryote model probably still won't work great, at best won't be entirely clear how well it is working, but I thought I'd extend the offer anyway.

alisandra commented 1 month ago

Ah, unfortunately I would not have that model anymore, but thanks for the offer here, it's a great idea 🔥