Closed schloegl closed 4 months ago
Hi,
Can you run pip freeze
and paste here the output?
Attached are two versions, one with tensorflow 2.10, the other with tensorflow 2.15
pip-freeze-deepEMhancer-tf210.txt pip-freeze-deepEMhancer-tf215.txt
Hi,
I think I have fixed the issue. Could you install (pip install --no-deps) deepEMhancer from the new branch issue35, and try to execute it?
Please, let me know if it works
PS. I have seen that you are using as input the postprocess map. This is probably not the best option. Use the halfmaps if you can.
The error has changed now
schloegl@gpu136:~/tests/job076$ deepemhancer -g 6 -i postprocess.mrc -o postprocess_deepemhanced.mrc
2024-07-11 16:42:30.409850: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 16:42:30.409890: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 16:42:30.411478: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
updating environment to select gpu: [6]
loading model /.../.local/share/deepEMhancerModels/production_checkpoints/deepEMhancer_tightTarget.hd5 ... DONE!
Automatic radial noise detected beyond 34 % of volume side
DONE!. Shape at 1.00 A/voxel after padding-> (368, 368, 368)
Neural net inference
0%| | 0/400 [00:00<?, ?it/s]error: libdevice not found at ./libdevice.10.bc
2024-07-11 16:43:28.988597: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:207] INTERNAL: Generating device code failed.
0%| | 0/400 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/.../deepEMhancer/20240711c/bin/deepemhancer", line 8, in <module>
sys.exit(commanLineFun())
^^^^^^^^^^^^^^^
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/exeDeepEMhancer.py", line 80, in commanLineFun
main( ** parseArgs() )
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/exeDeepEMhancer.py", line 72, in main
predVol= predictor.predict(inputVolOrFname, outputMap, binary_mask=binaryMask, noise_stats=noiseStats,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/applyProcessVol/processVol.py", line 193, in predict
batch_y_pred= self.model.predict_on_batch(np.expand_dims(batch_x, axis=-1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2880, in predict_on_batch
outputs = self.predict_function(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:
Detected at node model_1/group_normalization_1/Sqrt defined at (most recent call last):
File "/.../deepEMhancer/20240711c/bin/deepemhancer", line 8, in <module>
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/exeDeepEMhancer.py", line 80, in commanLineFun
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/exeDeepEMhancer.py", line 72, in main
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/deepEMhancer/applyProcessVol/processVol.py", line 193, in predict
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2880, in predict_on_batch
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2440, in predict_function
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2425, in step_function
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2413, in run_step
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 2381, in predict_step
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/training.py", line 590, in __call__
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/functional.py", line 515, in call
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "<string>", line 153, in call
File "/.../deepEMhancer/20240711c/lib/python3.11/site-packages/keras/src/backend.py", line 3041, in sqrt
JIT compilation failed.
[[{{node model_1/group_normalization_1/Sqrt}}]] [Op:__inference_predict_function_3713]
Hi,
This seems to be a Cuda problem and I haven't changed anything related. Could you try what is suggested here https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path? Or perhaps reinstalling cuda and/or tensorflow?
I am trying to create a singularity container to reproduce your error.
I can confirm that adding
export XLA_FLAGS="--xla_gpu_cuda_data_dir=${CUDA_HOME}"
fixed this issue. Thanks.
Trying run latest deepemhancer (commit 99e7c3140b4acc3a90cc110d4fc6423a04e09ca4) on Debian12 with nvidia-drivers 535.xxx which supports cuda/12.2 or lower. I've tweaked the installation procedure by relaxing the version-fixin in "install_requires". I tried two combinations with pip install in a venv using:
python/3.10,cuda/11.4.4, cudnn/8.1.1.33, tensorflow==2.10.0 python/3.11,cuda/12.2.0, cudnn/8.9.6.50,TensorRT/8.6.1.6,tensorflow==2.15.0
In both cases, the installation run through. When trying to use it, it fails in this way
Do you have any suggestions how to make this work ?