nf-core / funcscan

(Meta-)genome screening for functional and natural product gene sequences
https://nf-co.re/funcscan
MIT License
74 stars 20 forks source link

AMPLIFY_PREDICT exits with a ValueError while checking input #373

Open m3hdad opened 6 months ago

m3hdad commented 6 months ago

Description of the bug

AMPLIFY_PREDICT fails on some protein sequences. The issue is discussed on slack here. Two input sequences for which AMPLIFY_PREDICT fails/passes are available on Slack.

Pyrodigal and Prodigal were tested as annotation tools. AMPlify v.2.0.0 tested which failed on similar sequences.

Removing stop codon before running AMPlify solves the issue.

.command.err on AMPLIFY_PREDICT:

Command output:

  Loading balanced models...
  /usr/local/share/amplify/models/balanced/AMPlify_balanced_model_weights_1.h5
  /usr/local/share/amplify/models/balanced/AMPlify_balanced_model_weights_2.h5
  /usr/local/share/amplify/models/balanced/AMPlify_balanced_model_weights_3.h5
  /usr/local/share/amplify/models/balanced/AMPlify_balanced_model_weights_4.h5
  /usr/local/share/amplify/models/balanced/AMPlify_balanced_model_weights_5.h5

  Predicting...

Command error:
  2024-05-02 17:39:17.292177: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
  2024-05-02 17:39:17.345481: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
  Using TensorFlow backend.
  Traceback (most recent call last):
    File "/usr/local/share/amplify/src/AMPlify.py", line 309, in <module>
      main()
    File "/usr/local/share/amplify/src/AMPlify.py", line 213, in main
      y_score_valid, y_indv_list_valid = ensemble(out_model, X_seq_valid)
    File "/usr/local/share/amplify/src/AMPlify.py", line 101, in ensemble
      indv_pred.append(model_list[i].predict(X).flatten())
    File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 1149, in predict
      x, _, _ = self._standardize_user_data(x)
    File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
      exception_prefix='input')
    File "/usr/local/lib/python3.6/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
      'with shape ' + str(data_shape))
  ValueError: Error when checking input: expected Input to have 3 dimensions, but got array with shape (0, 1)

Command used and terminal output

No response

Relevant files

No response

System information

No response

jfy133 commented 5 months ago

Revisting this @m3hdad just now made me think: do you think this should be solved here or upstream in AMPLify itself?

What do yout think? One hand we can 'fix it' in the pipeline, but 'manual' modification of the input/output files isn't a great idea for a pipeline.

If you agree, could you make an issue on the AMPlify repo, and maybe we can see how fast they are in solving it?