Open ICUH opened 5 months ago
Hi, that is indeed a known issue with keras version >=3 (we need <3).
Can you try installing pyapetnet
completely from conda-froge
using conda
or mamba
instead of pip
?
Doing so, the correct versions of the dependencies should be installed.
To create a new conda env and install pyapetnet you can run:
conda create -n pyapetnet pyapetnet -c conda-forge
And then activate the env via:
conda activate pyapetnet
Which should give you access to all the pyapetnet
tools.
Hi had to take a hiatus before attempting the solution. It looks like, tensorflow 2.10 is the highest you can have for Windows Native.
From tensoflow.org "TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow or tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin"
So I went ahead with WSL2 and pip didn't install all dependency. So went ahead with conda, created new env, and install pyapetnet, everything went fine. (probably should recommend to create new env)
Now I have two problems when I try to run the test run. (I will put the error log on the bottom) The small problem is it can't find osem.nii (which I thought, it automatically downloaded when I install pyapetnet) and it's not in pyapetnet folder. The bigger problem is that it can't "register cuDNN" another word it could mean anything at this point. nvidia-smi show the correct gpu and nvcc -V shows right cuda compiler version. I am suspecting it could be that WSL is a virtual machine and I need to setup a docker to properly share the gpu. But I don't know if that will screw up other part of stuff or if that is the issue in the first place. I will try to setup a docker see if the problem goes away. but not sure about osem.nii and other nii files.
Let me know if you have any insights
Thanks
(strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
(strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvidia-smi
Tue Jun 4 13:10:13 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off |
| 0% 31C P8 14W / 450W | 552MiB / 24564MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 34 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
(strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
(strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ pyapetnet_predict_from_nifti osem.nii t1.nii S2_osem_b10_fdg_pe2i --show
2024-06-04 13:10:25.377092: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-06-04 13:10:25.395351: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-04 13:10:25.395381: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-04 13:10:25.395798: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-04 13:10:25.398757: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2024-06-04 13:10:26.194686: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.212923: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.212979: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.215170: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.215208: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.215234: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.317621: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.317679: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.317695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2022] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-06-04 13:10:26.317728: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-04 13:10:26.317753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21458 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/keras/src/layers/core/lambda_layer.py:327: UserWarning: tensorflow.python.keras.utils.multi_gpu_utils is not loaded, but a Lambda layer uses it. It may cause errors.
function = cls._parse_function_from_config(
Traceback (most recent call last):
File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/nibabel/loadsave.py", line 100, in load
stat_result = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'osem.nii'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/telluraves/miniconda3/envs/strigiformes/bin/pyapetnet_predict_from_nifti", line 10, in
Update on "register cuDNN" issue. I had to install pytorch-directml to let GPU pass through. (instead of nvidia-docker) It seems it worked? Let me know if any of the warnings I should be worried about. But still can't find osem.nii or test data set. (or it wasn't there in the first place...)
2024-06-04 13:57:58.614583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-04 13:57:58.664411: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-06-04 13:57:59.100274: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2024-06-04 13:57:59.100328: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so
2024-06-04 13:57:59.101752: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so
2024-06-04 13:57:59.278081: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2024-06-04 13:57:59.364682: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-04 13:57:59.366260: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 4090)
2024-06-04 13:57:59.452786: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-06-04 13:57:59.452826: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2024-06-04 13:57:59.452853: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 113518 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id:
Hi Georg I have tested demo data and it seems it's working (show the thee axis views of MRI/PET/guided-PET images.) I also have done several anatomy guide recon with my data using S2 model. I have few issues arising from that.
Let me know what you think.
Thanks in advanced.
Glad to hear that it is working. The alignment of SITK is indeed not deterministic and can fail.
If your images are already aligned (e.g. via another program), you can use the --no-coreg_inputs
option to skip the registration.
Georg
Hi I am pretty new to python and having issue with just running the test data. It seems it has issue with trained model has legacy file format but I could be wrong as well. Running this on windows 11 and keras version 3.3.3. I will embed errors under. I will also try to downgrade keras see if it works.
Thanks in advance.
PS C:\Users\Raptor_Ampere> pyapetnet_predict_from_nifti osem.nii t1.nii S2_osem_b10_fdg_pe2i --show 2024-05-23 22:24:16.700095: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Scripts\pyapetnet_predict_from_nifti.exe__main__.py", line 7, in
File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\predict_from_nifti.py", line 93, in main
model = tf.keras.models.load_model(os.path.join(model_path, model_name),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\saving\saving_api.py", line 193, in load_model
raise ValueError(
ValueError: File format not supported: filepath=C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\trained_models\S2_osem_b10_fdg_pe2i. Keras 3 only supports V3
TF_ENABLE_ONEDNN_OPTS=0
. 2024-05-23 22:24:17.134386: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0
. Traceback (most recent call last): File ".keras
files and legacy H5 format files (.h5
extension). Note that the legacy SavedModel format is not supported byload_model()
in Keras 3. In order to reload a TensorFlow SavedModel as an inference-only layer in Keras 3, usekeras.layers.TFSMLayer(C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\trained_models\S2_osem_b10_fdg_pe2i, call_endpoint='serving_default')
(note that yourcall_endpoint
might have a different name).P.S. I realize pyapetnet version installed by pip is 1.5.1. let me try conda.