stefanradev93 / BayesFlow

A Python library for amortized Bayesian workflows using generative neural networks.
https://bayesflow.org/
MIT License
286 stars 45 forks source link

bayesflow breaks existing tensorflow installation #162

Open jgman86 opened 3 months ago

jgman86 commented 3 months ago

Hey guys,

we are currently working with bayesflow on our hpc cluster (at least we try) due to virtualization. In the process of setting up the images, we noticed, that an existing tensorflow installation which loads all cuda modules as intended, is overwritten by a subsequernt installation of tensorflow when installing bf. This seems intended, if the version dependency for tensorflow is not met. However, during the installation process, something seems to happen which breaks the cuda libraries, even if a vanilla version of tensorflow which mets the version dependency is previously installed:

import bayesflow as bf
2024-04-18 16:43:36.139742: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-18 16:43:36.139795: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-18 16:43:36.141021: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-18 16:43:36.147866: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

>>> import tensorflow as tf
>>> tf.__version__
'2.15.0'
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-04-18 16:44:48.459233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /device:GPU:0 with 10534 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1

I have the same problem on Ubuntu 22.04 LTS, where I can't use bayesflow with the cuDNN, cuFFT and cuBLAS libs. I thought this was due to the buggy nature of tf, but now I think that something related to the Version dependencies. Is it possible to include the latest 2.16.1 Version of tf in the bf dependencies ? Or is it not yet compatible ? Additinoally there is now a tensorflow[and-cuda] package available over pip, which includes all necessary cuda libs. it would be great to include such versions to the bf dependencies if poossible, to streamline the installation process. In our case, usage on a cluster is therefore tricky, as we had to try and error different bf and tf versions, which actually work with the provided infrastructure!

has anybody experienced the same issues ?

vpratz commented 3 months ago

Thanks for reaching out! If I remember correctly, I have encountered those warnings/errors, but could use the GPU anyway. In the last line it says it is able to create a GPU device. What is the output of tf.config.list_physical_devices('GPU'), does a GPU show up there or is it an empty list?

Regarding the 2.16 version of TensorFlow: We are currently working on adapting BF to Keras 3, which will also enable using the newest TensorFlow version. As this is a bigger change, it will need a bit more time, you can track the progress in the PR #159

jgman86 commented 3 months ago

Hey Valentin, yes It finds the gpu - but from my searches the errors at the beginning indicate, that the libs are not used then, which results in slower training. and its also suspicious, that those errors are absent, before the bayesflow installtion ! yes I already track the streamline-backend branch and can't wait to finally use torch as backend ;-)

vpratz commented 3 months ago

Ahh, ok. Is the error absent when you install TF 2.15.0 without BayesFlow? Or is there a version change with the BF installation?

As this seems a usual problem according to this issue, we probably have to find which TF-related dependency changes and introduces the problem, though I don't know whether we can resolve this. Could you try the following?

The resulting diff may tell us something about the changes BF introduces to the TensorFlow installation, which might give us a lead on what we need to fix

ali-akhavan89 commented 1 month ago

I completely understand it is challenging, but it would be great if the new BF release could support tensorflow-cpu 2.16 too.

stefanradev93 commented 1 month ago

Absolutely. The new release will support all recent tensorflow, pytorch, and jax versions.