pangeo-data / pangeo-docker-images

Docker Images For Pangeo Jupyter Environment
https://pangeo-docker-images.readthedocs.io
MIT License
117 stars 90 forks source link

Import tensorflow failing on ml-notebook #536

Open Timh37 opened 2 months ago

Timh37 commented 2 months ago

Describe the bug Using the most recent image of the Tensorflow GPU ML-notebook, import tensorflow leads to the following error:

2024-04-23 07:46:16.210178: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-23 07:46:16.210210: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-23 07:46:16.210216: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-23 07:46:16.217822: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

and as a result, model training fails.

To Reproduce Start up a server with the ml-notebook image and runimport tensorflow.

Expected behavior To import tensorflow without issues.

Docker Image Version (e.g. quay.io/pangeo/ml-notebook:2023.02.27): NVIDIA Tesla T4, 24GB RAM, 8 CPUs, Pangeo Tensorflow ML Notebook

Infrastructure (Where you are running this image): The 2i2c JupyterHub for LEAP.

Additional context If I use an older version of the image (quay.io/pangeo/ml-notebook:2023.05.08) the errors disappear.