weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
140 stars 20 forks source link

How to use helixer to annotate a genome in real world? #79

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

I don't really understand how to use helixer (like augustus or genemark) to annotate a genome after reading the readme. Would you please provide a simple case about how to use helixer in real world (for example, annotate Arabidopsis thaliana genome in denovo way). Best, Kun

alisandra commented 2 years ago

Hi Kun,

Thanks for your interest!

You're quite right, the current code base & instructions in the Readme will get you only to base-wise class probabilities, not (yet) to an annotated genome.

To get some first applicability, we're working on a post-processor that will take these base-wise probabilities and make a best gene model prediction at each locus (output as gff3 file). We're getting very close, performing last optimizations and testing it before release.

So there are three options for applying Helixer:

Cheers, Alisandra

xiekunwhy commented 2 years ago

Hi Alisandra,

Thank you for your detail reply, and my cooperators told me that when the post-processor released, we will have a try. Thank you again.

Best, Kun

xiekunwhy commented 2 years ago

Hi @alisandra

Would please provide some more details about how to use Helixer+AUGUSTUS, we are looking for some methods to improve augustus results.

Best, Kun

alisandra commented 2 years ago

Dear Kun,

So good news first, the post-processor (https://github.com/TonyBolger/HelixerPost) is now publicly available, and instructions using this are in now in the main README. Additionally, instructions for the helixer+AUGUSTUS have now been added here. For de novo annotation of primary gene models, we recommend the former (better raw accuracy and faster); for use-cases where alternative transcripts are critical or to incorporate other extrinsic information, the later option using AUGUSTUS is more feature-complete.

We are still working on packaging and user-friendliness, but for now the instructions should be in a functional state, with elegance and convenience coming later.

I assume since you mentioned Arabidopsis that the provided pre-trained land-plant model will suit your purposes. Let me know if I'm wrong. A larger variety of current models (e.g. for predicting on animals and fungi) are coming soon.

Feel free to drop a line if you run into any trouble.

Cheers, Alisandra

xiekunwhy commented 2 years ago

Hi Alisandra,

Thank you for your reply. And I am glad to hear the good news, I will try helixer+HelixerPost and helixer+AUGUSTUS after successful installation (really hard because I have no root permission and no gpu hardware).

Actually, I am know annotating two fish genomes and three alga genomes, so hoping other pre-trained models coming soon.

Best, Kun

alisandra commented 2 years ago

Hi Kun,

OK, good to know . If you don't have any GPU hardware, then skip the entirety of the "GPU requirements" section. It will run much, much slower on a CPU than on a GPU of course, but it should run.

If you can ask your admin to install they python development libraries, and the hdf5 development libraries (or perhaps they are already there). The rest of helixer should work in a python virtual environment / not require sudo rights.

As installing the lzf support for HelixerPost would also require sudo rights, I have added a work around by changing the default compression to "gzip". If you run git pull and rerun pip install . (both from the Helixer directory) and run prediction from the start, you should now be able to use HelixerPost wihout the lzf support (i.e. skip from the hd5 development libraries directly to usage when installing HelixerPost).

I will send an update again here when more models and/or more convenient packaging is available.

Cheers, Ali