weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
139 stars 20 forks source link

helixer_post_bin does not work with trained models ( No "predictions_phase" on predictions.h5) #101

Closed piroyon closed 11 months ago

piroyon commented 1 year ago

Hi, I am trying to predict genes from the genome of an algae. The genome size is 40M. I have 9 annotated genomes of closely related species, 6 for training, 2 for validation, 1 for testing and the best model accuracy was 0.85. When we tried to use this model to predict genes from our genome, helixer_post_bin did not work. The error is as follows

Neural network prediction done. Starting post processing.
thread 'main' panicked at 'Failed to open Base / ClassPrediction / PhasePrediction Datasets: H5Dopen2(): unable to open dataset: object 'predictions_phase' doesn't exist', helixer_post_bin/src/main.rs:33:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

When I look at the predictions.h5 file in a HDF viewer, there is no top hierarchy called predictions_phase. Any solution or hints? Thank you.

alisandra commented 11 months ago

Hi Piroyon,

First a sincere thank you for stress testing the documentation! I've updated it here: https://github.com/weberlab-hhu/Helixer/blob/cleanup/docs/training.md, and will merge it into main soon.

The good news is it's a simple fix, but the bad news is that it requires retraining.

You will need to add --predict-phase during both training and inference.