Closed mmcguffi closed 1 year ago
To do this, set the environment variable to an empty value: CUDA_VISIBLE_DEVICES=""
.
Thanks for the help!
This worked on the PromethION, however I recently moved to a different server and this solution no longer seems to work.
Ubuntu medaka v1.7.2
This is the error (many lines of normal logging excluded):
[19:29:53 - Predict] Found a GPU.
[19:29:53 - Predict] If cuDNN errors are observed, try setting the environment variable `TF_FORCE_GPU_ALLOW_GROWTH=true`. To explicitely disable use of cuDNN use the commandline option `--disable_cudnn. If OOM (out of memory) errors are found please reduce batch size.`
[19:29:53 - Predict] Processing 93 long region(s) with batching.
[19:29:53 - ModelLoad] GPU available: building model with cudnn optimization
[19:29:54 - MdlStrTF] Model <keras.engine.sequential.Sequential object at 0x7f8d5cf5be50>
...
2023-04-15 19:30:05.644748: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-04-15 19:30:05.645273: E tensorflow/stream_executor/cuda/cuda_dnn.cc:379] Possibly insufficient driver version: 515.105.1
[19:30:05 - MdlStrTF] ModelStoreTF exception <class 'tensorflow.python.framework.errors_impl.UnknownError'>
Traceback (most recent call last):
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/bin/medaka", line 11, in <module>
sys.exit(main())
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/medaka/medaka.py", line 724, in main
args.func(args)
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/medaka/prediction.py", line 166, in predict
remainder_regions = run_prediction(
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/medaka/prediction.py", line 48, in run_prediction
class_probs = model.predict_on_batch(x_data)
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/keras/engine/training.py", line 1986, in predict_on_batch
outputs = self.predict_function(iterator)
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/path/.snakemake/conda/eb4706c19b3a3c9c7a73db0bb3461ea8_/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 58, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/bidirectional/backward_gru1/PartitionedCall]] [Op:__inference_predict_function_3293]
Function call stack:
predict_function -> predict_function -> predict_function
Failed to run medaka consensus.
Here nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 Off | 00000000:21:00.0 Off | Off |
| 41% 42C P8 16W / 140W | 6MiB / 16376MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I would like to force Medaka to use CPU resources instead of the GPU
Sorry, I'm unsure why setting CUDA_VISIBLE_DEVICES="" would not have that effect you desire.
Ah, I needed export CUDA_VISIBLE_DEVICES=""
in my bash script -- Im not sure why this was previously working without the export
.
Thank you for the response and help!
Describe the bug I am trying to run
medaka_consensus
on a PromethION (x4 A100s), though the A100s are often occupied. I would like to force medaka to use the CPU, though I cannot find a way to coerce it to do this -- it automatically detects and uses one of the GPUs when they are occupied and the process quickly fails.Logging Several errors can happen, but typically it's:
Environment (if you do not have a GPU, write No GPU):
Note I apologize if the bug label is not correct; that just seemed more appropriate than a feature request.