Closed ynusinovich closed 7 months ago
Output of conda list
, if it's helpful:
# packages in environment at /home/yannusinovich/anaconda3/envs/mlzc2:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.0.0 pypi_0 pypi
asttokens 2.0.5 pyhd3eb1b0_0
astunparse 1.6.3 pypi_0 pypi
backcall 0.2.0 pyhd3eb1b0_0
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.08.22 h06a4308_0
cachetools 5.3.2 pypi_0 pypi
certifi 2023.11.17 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
comm 0.1.2 py311h06a4308_0
debugpy 1.6.7 py311h6a678d5_0
decorator 5.1.1 pyhd3eb1b0_0
executing 0.8.3 pyhd3eb1b0_0
flatbuffers 23.5.26 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
gast 0.4.0 pypi_0 pypi
giflib 5.2.1 h5eee18b_3
google-auth 2.23.4 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.59.3 pypi_0 pypi
h5py 3.10.0 pypi_0 pypi
idna 3.5 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
ipykernel 6.25.0 py311h92b7b1e_0
ipython 8.15.0 py311h06a4308_0
jedi 0.18.1 py311h06a4308_1
jpeg 9e h5eee18b_1
jupyter_client 8.6.0 py311h06a4308_0
jupyter_core 5.5.0 py311h06a4308_0
keras 2.13.1 pypi_0 pypi
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libclang 16.0.6 pypi_0 pypi
libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libpng 1.6.39 h5eee18b_0
libsodium 1.0.18 h7b6447c_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.1 h6a678d5_0
libuuid 1.41.5 h5eee18b_0
libwebp 1.3.2 h11a3e52_0
libwebp-base 1.3.2 h5eee18b_0
lz4-c 1.9.4 h6a678d5_0
markdown 3.5.1 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib-inline 0.1.6 py311h06a4308_0
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py311h5eee18b_1
mkl_fft 1.3.8 py311h5eee18b_0
mkl_random 1.2.4 py311hdb19cb5_0
ncurses 6.4 h6a678d5_0
nest-asyncio 1.5.6 py311h06a4308_0
numpy 1.24.3 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.12 h7f8727e_0
opt-einsum 3.3.0 pypi_0 pypi
packaging 23.1 py311h06a4308_0
parso 0.8.3 pyhd3eb1b0_0
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 10.0.1 py311ha6cbd5a_0
pip 23.3.1 py311h06a4308_0
platformdirs 3.10.0 py311h06a4308_0
prompt-toolkit 3.0.36 py311h06a4308_0
protobuf 4.25.1 pypi_0 pypi
psutil 5.9.0 py311h5eee18b_0
ptyprocess 0.7.0 pyhd3eb1b0_2
pure_eval 0.2.2 pyhd3eb1b0_0
pyasn1 0.5.1 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pygments 2.15.1 py311h06a4308_1
python 3.11.5 h955ad1f_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pyzmq 25.1.0 py311h6a678d5_0
readline 8.2 h5eee18b_0
requests 2.31.0 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scipy 1.11.3 py311h08b1b3b_0
setuptools 68.0.0 py311h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.2 h5eee18b_0
stack_data 0.2.0 pyhd3eb1b0_0
tbb 2021.8.0 hdb19cb5_0
tensorboard 2.13.0 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
tensorflow 2.13.1 pypi_0 pypi
tensorflow-estimator 2.13.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.34.0 pypi_0 pypi
termcolor 2.3.0 pypi_0 pypi
tflite-runtime 2.14.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tornado 6.3.3 py311h5eee18b_0
traitlets 5.7.1 py311h06a4308_0
typing-extensions 4.5.0 pypi_0 pypi
tzdata 2023c h04d1e81_0
urllib3 2.1.0 pypi_0 pypi
wcwidth 0.2.5 pyhd3eb1b0_0
werkzeug 3.0.1 pypi_0 pypi
wheel 0.41.2 py311h06a4308_0
wrapt 1.16.0 pypi_0 pypi
xz 5.4.2 h5eee18b_0
zeromq 4.3.4 h2531618_0
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0
545 also broke pytorch multi gpu. I'm at my wits end with pop, they have wasted a week of my life with their shitty driver packaging in the past 6 months.
The fact that I can't downgrade is really, really cool. Good job team.
TF-TRT Warning: Could not find TensorRT
https://discuss.tensorflow.org/t/unable-to-get-tensorflow-working-correctly/18981/2
You are missing the TensorRT library path from your LD_LIBRARY_PATH
. But you should be looking into switching over to Docker as these tools typically depend on specific versions of NVIDIA drivers and the CUDA toolkit.
@Tostino There's no issues with our packaging, and this issue has nothing to do with the NVIDIA driver. They're missing the TensorRT library path from their LD paths.
NVIDIA provides the driver installer and we package that installer. You get precisely what NVIDIA has packaged in their installer. Our QA team tests every driver release, and that includes Tensorflow testing using Docker.
I meant the lack of ability to downgrade, and automatically updating to newer drivers has caused issues with the major ML packages.
Sorry if I piled on an unrelated issue, but there are widespread issues with the 545 drivers, and not being able to go back to 535 easily has been a serious hassle.
Similar issues with poor software support happened when 535 originally came out and I was upgraded to those. Unless I am just missing something here and downgrading works fine and I'm just slow and doing something wrong...in which case, apologies, and please ignore me.
nvidia-driver-535-server
is an option, but we aren't testing these server packages from Ubuntu.
Appreciate you mentioning that. I thought that server meant headless in this case, but it looks like I was wrong and these should work. Will give it a shot, Appreciate it.
@mmstick Thank you for your help.
The page that I had sent you (https://www.tensorflow.org/install/pip) updated their instructions. Now it has a new first line with additional packages to install:
python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1
python3 -m pip install -U tensorflow[and-cuda]
# Verify the installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
And now Tensorflow works with GPU.
FYI TensorRT is still not working when I import Tensorflow, but I don't need it to for this practice project... I tried adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yannusinovich/anaconda3/envs/mlzc2/lib/python3.11/site-packages/tensorrt
to .bashrc and as an environment variable in my Jupyter notebook, but it still can't find TensorRT.
I definitely concur with @Tostino that it would be nice if there was an option to quickly downgrade to a previous version of the NVIDIA drivers in the Pop!_Shop for situations where the compatibility temporarily dies with an update.
These instructions say that CUDA and cuDNN are already installed in my Adder WS with Pop!_OS 22.04 LTS: https://support.system76.com/articlesf/cuda/ I followed these instructions to install TensorFlow GPU: https://www.tensorflow.org/install/pip I got the following error message when I tried running Tensorflow with GPU. This is new since last week, when everything was working. I did not change the Python environment, I only did Pop!_Shop system updates:
My questions: