nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

ModelStoreTF exception <class 'tensorflow.python.framework.errors_impl.InternalError'> #501

Closed jiehua1995 closed 3 months ago

jiehua1995 commented 3 months ago

Hi, I found such a problem in using medaka, do you have any idea of how to solve it? I will be very grateful if you could provide some help. Thank you in advance.

Describe the bug I am using medaka to generate consensus with initial assembly from flye and chopper filtered reads and I have such a bug:

[18:14:04 - MdlStrTF] ModelStoreTF exception <class 'tensorflow.python.framework.errors_impl.InternalError'>
Traceback (most recent call last):
  File "/home/zfp_da03/anaconda3/envs/medaka/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/medaka/medaka.py", line 814, in main
    args.func(args)
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/medaka/prediction.py", line 167, in predict
    remainder_regions_depth = run_prediction(
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/medaka/prediction.py", line 47, in run_prediction
    class_probs = model.predict_on_batch(x_data)
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/keras/engine/training.py", line 2474, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/zfp_da03/anaconda3/envs/medaka/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 256, 128, 1, 10000, 100, 0] 
     [[{{node CudnnRNN}}]]
     [[sequential/bidirectional_1/backward_gru2/PartitionedCall]] [Op:__inference_predict_function_3295]
Failed to run medaka consensus.

The script I used:

# Define the number of cores to use
current_hour=$(date +"%H")
if [ $current_hour -ge 9 ] && [ $current_hour -lt 17 ]; then
    processcores=16
else
    processcores=48
fi
echo "Number of cores to use: $processcores"

# Define the model to use for medaka polishing
model="r1041_e82_400bps_sup_v4.3.0"

# Set the environment variable to allow GPU growth
export TF_FORCE_GPU_ALLOW_GROWTH=true

echo "Perform polishing with initial assembly from raw reads for sample: $sample"

# Define the path to the initial assembly from raw reads
raw_assembly_path="/ten_TB/Hua/Nanopore_hybrid/fastq_pass/${sample}/${sample}_flye_raw/assembly.fasta"
# Define the path to the raw reads
raw_reads_path="/ten_TB/Hua/Nanopore_hybrid/fastq_pass/${sample}/${sample}_merged.fastq.gz"
# Define the path to the output directory
output_dir="/ten_TB/Hua/Nanopore_hybrid/fastq_pass/${sample}/${sample}_medaka_raw"
# Perform the polishing with initial assembly from raw reads
medaka_consensus \
-i $raw_reads_path \
-d $raw_assembly_path \
-o $output_dir \
-t $processcores \
-m $model

Logging Please attach any relevant logging messages. (Use ``` before and after code blocks).

Environment (if you do not have a GPU, write No GPU):

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

 - CUDA version

cuda-version 11.8 h70ddcb2_2 conda-forge cudatoolkit 11.8.0 h4ba93d1_13 conda-forge cudnn 8.8.0.121 hcdd5f01_4 conda-forge

jiehua1995 commented 3 months ago

Sorry, I found a similar issue that is closed. I just need to set -b 50 to decrease the batch size.

Thank you very much! Have a nice day.