Closed cwying closed 1 year ago
Hi, thanks for reporting this issue. Can you report the following to debug the issue:
conda list
in your conda env, such that I can see the installed package versions?
(it could be an issue with a new keras/tensorflow version)Hi Georg,
Thank you for your prompt response! Please see the information below:
conda list
returns:Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 1.4.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi bzip2 1.0.8 h7b6447c_0 ca-certificates 2023.01.10 h06a4308_0 cachetools 5.3.0 pypi_0 pypi certifi 2022.12.7 py310h06a4308_0 charset-normalizer 3.0.1 pypi_0 pypi contourpy 1.0.7 pypi_0 pypi cycler 0.11.0 pypi_0 pypi flatbuffers 23.1.21 pypi_0 pypi fonttools 4.38.0 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.16.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.51.1 pypi_0 pypi h5py 3.8.0 pypi_0 pypi idna 3.4 pypi_0 pypi imageio 2.25.0 pypi_0 pypi keras 2.11.0 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libclang 15.0.6.1 pypi_0 pypi libffi 3.4.2 h6a678d5_6 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 llvmlite 0.39.1 pypi_0 pypi markdown 3.4.1 pypi_0 pypi markupsafe 2.1.2 pypi_0 pypi matplotlib 3.6.3 pypi_0 pypi ncurses 6.3 h5eee18b_3 networkx 3.0 pypi_0 pypi nibabel 5.0.0 pypi_0 pypi numba 0.56.4 pypi_0 pypi numpy 1.23.5 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi openssl 1.1.1s h7f8727e_0 opt-einsum 3.3.0 pypi_0 pypi packaging 23.0 pypi_0 pypi pillow 9.4.0 pypi_0 pypi pip 22.3.1 py310h06a4308_0 protobuf 3.19.6 pypi_0 pypi pyapetnet 1.5.1 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pydicom 2.3.1 pypi_0 pypi pymirc 0.28 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.10.9 h7a1cb2a_0 python-dateutil 2.8.2 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi readline 8.2 h5eee18b_0 requests 2.28.2 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-image 0.19.3 pypi_0 pypi scipy 1.10.0 pypi_0 pypi setuptools 65.6.3 py310h06a4308_0 simpleitk 2.2.1 pypi_0 pypi six 1.16.0 pypi_0 pypi sqlite 3.40.1 h5082296_0 tensorboard 2.11.2 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.11.0 pypi_0 pypi tensorflow-estimator 2.11.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.30.0 pypi_0 pypi termcolor 2.2.0 pypi_0 pypi tifffile 2023.1.23.1 pypi_0 pypi tk 8.6.12 h1ccaba5_0 typing-extensions 4.4.0 pypi_0 pypi tzdata 2022g h04d1e81_0 urllib3 1.26.14 pypi_0 pypi werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 wrapt 1.14.1 pypi_0 pypi xz 5.2.10 h5eee18b_1 zlib 1.2.13 h5eee18b_0
2023-01-25 13:33:22.396931: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-25 13:33:22.523543: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2023-01-25 13:33:26.200905: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2023-01-25 13:33:26.201042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2023-01-25 13:33:26.201060: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2023-01-25 13:33:33.319327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2023-01-25 13:33:33.319363: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-01-25 13:33:33.319929: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/layers/core/lambda_layer.py:324: UserWarning: tensorflow.python.keras.utils.multi_gpu_utils is not loaded, but a Lambda layer uses it. It may cause errors.
function = cls._parse_function_from_config(
/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/pymirc/fileio/read_dicom.py:134: UserWarning: Cannot find SeriesType in first dicom header. Setting it to ('STATIC', 'IMAGE')
warnings.warn(f'Cannot find SeriesType in first dicom header. Setting it to {fallback_series_type}')
pyapetnet_predict_from_dicom ./brainweb_06_osem_dcm/ ./brainweb_06_t1_dcm/ S2_osem_b10_fdg_pe2i --show
as suggested. (already cd to the data folder).On a separate note, is there any required/recommended CUDA version for this tool? My understanding is that the prediction part use CPU so it does not matter much, but we are planning to try to retrain this model with additional data.
Thank you!
To test if it is indeed related to problems with tensorflow + CUDA, can you:
conda remove -n pyapetnet --all
conda create pyapetpnet
+ conda activate pyapetnet
pip install pyapetnet
In the official guide, the matching cuda-toolkit version (11.2) and cudnn version (8.1) get installed in your conda env.
You can find the required CUDA version for tensorflow here: https://www.tensorflow.org/install/source#tested_build_configurations
The install is a bit messy, since in contrast to pytorch, tensorflow did not manage to put their libs on conda-forge
Thank you! I will test it and let you know. May I ask what CUDA driver version you are current using? (if checking with nvidia-smi
).
In my test I used CUDA version 12.0 and driver version 525.78.01. So probably not optimal (maybe I should also stick to the official tensorflow install instructions)
Thank you so much for the information!
Hi Georg,
I tried to install tf first following the official guide, but it still did not work. I then tried to install tf via conda despite not being recommended (conda install -c conda-forge tensorflow-gpu
) first, then installed pyapetnet via pip, and it works! I am not sure why. Maybe conda just solves the compatibility issues better...
Anyway, I have one more question: how long does it take for you to run prediction on one demo case? It took me >1hr on a server with A100 GPU, which does not make sense to me.
Thanks again for your help!
(1) Glad that at least the installation works. Prediction of the demo case (even without using any GPU) should be less than one minute. If it takes >1h then there is sth severely wrong (but I have no clue what goes wrong). If you "only" want to do predictions, you don't need any GPU. Can you maybe try to install tensorflow without GPU?
(2) Another option: Would it be an option for you to run the prediction from a docker container? We have a docker file here: https://github.com/gschramm/pyapetnet/blob/master/Dockerfile
Thank you so much for the information! We will try the options you recommended. Hopefully it will work as expected soon! I will close this issue for now.
Hello!
I just installed pyapetnet on our server following the official instruction, and confirms that it is successful. However, when I tried the demo data (both DICOM and NIFTI), I got the following error. I wonder whether you have any suggestions on how to solve this issue? Thank you!
Traceback (most recent call last): File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/bin/pyapetnet_predict_from_dicom", line 8, in
sys.exit(main())
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/pyapetnet/predict_from_dicom.py", line 168, in main
pred = model.predict(x).squeeze()
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'functional_3/functional_1/batchnorm_ind_0_1/FusedBatchNormV3_2' defined at (most recent call last): File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/bin/pyapetnet_predict_from_dicom", line 8, in
sys.exit(main())
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/pyapetnet/predict_from_dicom.py", line 168, in main
pred = model.predict(x).squeeze()
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 2350, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 2137, in predict_function
return step_function(self, iterator)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 2123, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in run_step
outputs = model.predict_step(data)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 2079, in predict_step
return self(x, training=False)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, *kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 561, in call
return super().call(args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in call
outputs = call_fn(inputs, *args, *kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/functional.py", line 511, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/functional.py", line 668, in _run_internal_graph
outputs = node.layer(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, *kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/training.py", line 561, in call
return super().call(args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in call
outputs = call_fn(inputs, *args, *kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/functional.py", line 511, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/functional.py", line 668, in _run_internal_graph
outputs = node.layer(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, *kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in call
outputs = call_fn(inputs, args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, kwargs)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/layers/normalization/batch_normalization.py", line 866, in call
outputs = self._fused_batch_norm(inputs, training=training)
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/layers/normalization/batch_normalization.py", line 659, in _fused_batch_norm
output, mean, variance = control_flow_util.smart_cond(
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/utils/control_flow_util.py", line 108, in smart_cond
return tf.internal.smart_cond.smart_cond(
File "/bmr207/nmrgrp/nmr175/.conda/envs/pyapetnet/lib/python3.10/site-packages/keras/layers/normalization/batch_normalization.py", line 648, in _fused_batch_norm_inference
return tf.compat.v1.nn.fused_batch_norm(
Node: 'functional_3/functional_1/batchnorm_ind_0_1/FusedBatchNormV3_2'
scale must have the same number of elements as the channels of x, got 15 and 1
[[{{node functional_3/functional_1/batchnorm_ind_0_1/FusedBatchNormV3_2}}]] [Op:__inference_predict_function_14243]**