gschramm / pyapetnet

a CNN for anatomy-guided deconvolution and denoising of PET images
https://gschramm.github.io/pyapetnet/
MIT License
16 stars 5 forks source link

Issue with old format of keras? #24

Open ICUH opened 5 months ago

ICUH commented 5 months ago

Hi I am pretty new to python and having issue with just running the test data. It seems it has issue with trained model has legacy file format but I could be wrong as well. Running this on windows 11 and keras version 3.3.3. I will embed errors under. I will also try to downgrade keras see if it works.

Thanks in advance.

PS C:\Users\Raptor_Ampere> pyapetnet_predict_from_nifti osem.nii t1.nii S2_osem_b10_fdg_pe2i --show 2024-05-23 22:24:16.700095: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-23 22:24:17.134386: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Scripts\pyapetnet_predict_from_nifti.exe__main__.py", line 7, in File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\predict_from_nifti.py", line 93, in main model = tf.keras.models.load_model(os.path.join(model_path, model_name), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\keras\src\saving\saving_api.py", line 193, in load_model raise ValueError( ValueError: File format not supported: filepath=C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\trained_models\S2_osem_b10_fdg_pe2i. Keras 3 only supports V3 .keras files and legacy H5 format files (.h5 extension). Note that the legacy SavedModel format is not supported by load_model() in Keras 3. In order to reload a TensorFlow SavedModel as an inference-only layer in Keras 3, use keras.layers.TFSMLayer(C:\Users\Raptor_Ampere\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyapetnet\trained_models\S2_osem_b10_fdg_pe2i, call_endpoint='serving_default') (note that your call_endpoint might have a different name).

P.S. I realize pyapetnet version installed by pip is 1.5.1. let me try conda.

gschramm commented 5 months ago

Hi, that is indeed a known issue with keras version >=3 (we need <3).

Can you try installing pyapetnet completely from conda-froge using conda or mamba instead of pip? Doing so, the correct versions of the dependencies should be installed.

To create a new conda env and install pyapetnet you can run:

conda create -n pyapetnet pyapetnet -c conda-forge

And then activate the env via:

conda activate pyapetnet

Which should give you access to all the pyapetnet tools.

ICUH commented 5 months ago

Hi had to take a hiatus before attempting the solution. It looks like, tensorflow 2.10 is the highest you can have for Windows Native.

From tensoflow.org "TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow or tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin"

So I went ahead with WSL2 and pip didn't install all dependency. So went ahead with conda, created new env, and install pyapetnet, everything went fine. (probably should recommend to create new env)

Now I have two problems when I try to run the test run. (I will put the error log on the bottom) The small problem is it can't find osem.nii (which I thought, it automatically downloaded when I install pyapetnet) and it's not in pyapetnet folder. The bigger problem is that it can't "register cuDNN" another word it could mean anything at this point. nvidia-smi show the correct gpu and nvcc -V shows right cuda compiler version. I am suspecting it could be that WSL is a virtual machine and I need to setup a docker to properly share the gpu. But I don't know if that will screw up other part of stuff or if that is the issue in the first place. I will try to setup a docker see if the problem goes away. but not sure about osem.nii and other nii files.

Let me know if you have any insights

Thanks


(strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Apr_17_19:19:55_PDT_2024 Cuda compilation tools, release 12.5, V12.5.40 Build cuda_12.5.r12.5/compiler.34177558_0 (strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvidia-smi Tue Jun 4 13:10:13 2024
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off | | 0% 31C P8 14W / 450W | 552MiB / 24564MiB | 3% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 34 G /Xwayland N/A | +-----------------------------------------------------------------------------------------+ (strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Apr_17_19:19:55_PDT_2024 Cuda compilation tools, release 12.5, V12.5.40 Build cuda_12.5.r12.5/compiler.34177558_0 (strigiformes) (base) telluraves@DESKTOP-J59V4QS:~$ pyapetnet_predict_from_nifti osem.nii t1.nii S2_osem_b10_fdg_pe2i --show 2024-06-04 13:10:25.377092: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-04 13:10:25.395351: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-06-04 13:10:25.395381: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-06-04 13:10:25.395798: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-06-04 13:10:25.398757: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory. 2024-06-04 13:10:26.194686: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.212923: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.212979: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.215170: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.215208: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.215234: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.317621: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.317679: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.317695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2022] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-04 13:10:26.317728: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-06-04 13:10:26.317753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21458 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9 /home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/keras/src/layers/core/lambda_layer.py:327: UserWarning: tensorflow.python.keras.utils.multi_gpu_utils is not loaded, but a Lambda layer uses it. It may cause errors. function = cls._parse_function_from_config( Traceback (most recent call last): File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/nibabel/loadsave.py", line 100, in load stat_result = os.stat(filename) FileNotFoundError: [Errno 2] No such file or directory: 'osem.nii'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/telluraves/miniconda3/envs/strigiformes/bin/pyapetnet_predict_from_nifti", line 10, in sys.exit(predict_from_nifti()) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/pyapetnet/scripts/predict_from_nifti.py", line 90, in predict_from_nifti pet, pet_affine = load_nii_in_ras(pet_fname) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/pyapetnet/utils.py", line 71, in load_nii_in_ras nii = nib.load(fname) File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/nibabel/loadsave.py", line 102, in load raise FileNotFoundError(f"No such file or no access: '{filename}'") FileNotFoundError: No such file or no access: 'osem.nii'

ICUH commented 5 months ago

Update on "register cuDNN" issue. I had to install pytorch-directml to let GPU pass through. (instead of nvidia-docker) It seems it worked? Let me know if any of the warnings I should be worried about. But still can't find osem.nii or test data set. (or it wasn't there in the first place...)

2024-06-04 13:57:58.614583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-06-04 13:57:58.664411: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-04 13:57:59.100274: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so 2024-06-04 13:57:59.100328: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so 2024-06-04 13:57:59.101752: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so 2024-06-04 13:57:59.278081: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters. WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory. 2024-06-04 13:57:59.364682: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-06-04 13:57:59.366260: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 4090) 2024-06-04 13:57:59.452786: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-04 13:57:59.452826: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device. 2024-06-04 13:57:59.452853: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 113518 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: ) /home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/keras/layers/core/lambda_layer.py:327: UserWarning: tensorflow.python.keras.utils.multi_gpu_utils is not loaded, but a Lambda layer uses it. It may cause errors. function = cls._parse_function_from_config( Traceback (most recent call last): File "/home/telluraves/miniconda3/envs/strigiformes/lib/python3.10/site-packages/nibabel/loadsave.py", line 100, in load stat_result = os.stat(filename) FileNotFoundError: [Errno 2] No such file or directory: 'osem.nii'

gschramm commented 5 months ago

The "demo" osem.nii and t1.nii are included into the repository - see here. To get the repository, you can either git clone it, or you simple download the sources as a zip from here.

ICUH commented 4 months ago

Hi Georg I have tested demo data and it seems it's working (show the thee axis views of MRI/PET/guided-PET images.) I also have done several anatomy guide recon with my data using S2 model. I have few issues arising from that.

  1. NUMA support, should I worry about it? (I could try to build a proper docker for not being native linux system)
  2. Sometime, MRI, PET and guide_PET images are misaligned (MRI vs PET/guided PET) and this misalignment is changing, whenever I run the exact commend again. Input MRI and PET are aligned begin with. This does not happen when, log has this warning (serWarning: SITK registation failed. Using initial transform warnings.warn("SITK registation failed. Using initial transform"))

Let me know what you think.

Thanks in advanced.

gschramm commented 4 months ago

Glad to hear that it is working. The alignment of SITK is indeed not deterministic and can fail. If your images are already aligned (e.g. via another program), you can use the --no-coreg_inputs option to skip the registration.

Georg