Closed wyim-pgl closed 1 year ago
Dear wyim-pgl,
Thanks for your interest! The models from the paper, which work well for vertebrates or land plants are available here: https://zenodo.org/record/3974409; and the instructions for using these models can still be found in the v0.2.0 tag (https://github.com/weberlab-hhu/Helixer/tree/v0.2.0). Unfortunately, as these 1. were never validated for arthropods, and are really only expected to work well for vertebrates, and 2. were not as applicable as current models (couldn't yet produce a gff3 file); so I kinda doubt this will help you much.
What I can say, is that when I get a chance to work on this again, getting some current fungi and broadly applicable animal models up is basically top priority. Still I don't know when this will be, so I cannot recommend to wait on it. I will reply here again when they are there.
Cheers, Alisandra
Hi, We are trying to train multiple reference genome for Helixer. We couldn't find any way to merge multiple GFFs into one sqlite3. Do you have any recommendations for it?
Dear wyim-pgl,
Some quick tips on training
import2geenuff.py
and for each of these, create a separate h5 file with geenuff2h5.py
.e.g., an excerpt from a recent training run of mine on fungi
data_directory
├── training_data.Alternaria_rosae.h5
├── training_data.Aspergillus_aculeatus.h5
├── training_data.Aspergillus_chevalieri.h5
├── training_data.Aspergillus_nomiae.h5
...
├── validation_data.Bipolaris_sorokiniana.h5
├── validation_data.Bipolaris_victoriae.h5
├── validation_data.Blastomyces_gilchristii.h5
├── validation_data.Candida_albicans.h5
...
└── validation_data.Zasmidium_cellare.h5
note that since initial publication, and perhaps not surprisingly, we have gotten substantially better generalization using separate species for validation, and not a subset of the training genomes. For the sake of time, we do generally still down-sample the validation genomes via https://github.com/weberlab-hhu/helixer_scratch/blob/master/data_scripts/sample-single-genomes.py
Dear wyim-pgl,
At long last we've released invertebrate models. The current best one is invertebrate_v0.3_m_0100 found here: https://uni-duesseldorf.sciebo.de/s/lQTB7HYISW71Wi0 or obtainable by running fetch_helixer_models.py
with the latest version (v0.3.0) installed.
These invertebrate models remain more experimental than for other phylogenetic groups, but nevertheless appear to be better than competing de novo annotation tools, so we're releasing them.
Cheers, Alisandra
Dear Helixer, I am looking for animal, specifically arthropod training file. The results have been mentioned on your paper but couldn't find the training file. Can you please tell me where it is? Thanks.