octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
794 stars 156 forks source link

warnings related to CUDA, cuDNN,TensorRT etc. #69

Open zwbx opened 6 months ago

zwbx commented 6 months ago

When I ran the fine-tuning script, I noticed that there were warnings related to CUDA, cuDNN,TensorRT etc. I follow the env setting in the readme. I suspected that these might be due to the incompatibility between JAX and the environment.

(octo) wenbo@wenbo-4090:~/Documents/data/octo/scripts$ python finetune.py /media/wenbo/12T/octo/scripts/finetune.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses import imp 2024-03-27 16:43:55.322631: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-27 16:43:55.322684: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-27 16:43:55.453763: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-27 16:43:57.157329: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT W0327 16:44:02.909117 128929550399296 compilation_cache.py:59] Initialized persistent compilation cache at /home/wenbo/.jax_compilation_cache I0327 16:44:03.371840 128929550399296 xla_bridge.py:633] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA I0327 16:44:03.382258 128929550399296 xla_bridge.py:633] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory I0327 16:44:03.382936 128929550399296 finetune_rlbench.py:66]

erikbr01 commented 6 months ago

Some errors from tensorflow seem to be normal, see this article: https://medium.com/@dev-charodeyka/tensorflow-conda-nvidia-gpu-on-ubuntu-22-04-3-lts-ad61c1d9ee32

I have a working conda environment with setup steps here: https://github.com/erikbr01/octo_experiments/blob/main/README.md