zkmkarlsruhe / ofxTensorFlow2

TensorFlow 2 AI/ML library wrapper for openFrameworks
Other
109 stars 16 forks source link

GPU training issue #23

Closed paul-ferragut closed 2 years ago

paul-ferragut commented 2 years ago

GPU training using main.py with pix2pix is not working on my setup. The CPU training and the oF examples are working.

First I was missing .dll such as cudart64_101.dll or cusolver64_10.dll , i added the .dll from other folder or changed the name as suggested here https://stackoverflow.com/questions/65608713/tensorflow-gpu-could-not-load-dynamic-library-cusolver64-10-dll-dlerror-cuso Ultimately it started training but blocking on Epoch 1 then, epoch 1 stays at 0% and the training continue. After training the resulting predicated image output is a full black images.

my setup: Windows 11 RTX 3090 driver 516.01 conda environement with python 3.7 and the packages version from requirments.txt CUDA 11.7 and CUDNN 8.4.1

paul-ferragut commented 2 years ago

I modified requirements.txt with newer packages and now I can train with the GPU Each epoch went from 4min to 10 seconds

absl-py==0.11.0
astunparse==1.6.3
cachetools==4.2.0
certifi==2020.12.5
chardet==4.0.0
cycler==0.10.0
gast==0.4.0
google-auth==1.24.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.37.0
h5py==3.1.0
idna==2.10
importlib-metadata==3.3.0
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver==1.3.1
Markdown==3.3.3
matplotlib==3.3.3
numpy==1.19.2
oauthlib==3.1.0
opt-einsum==3.3.0
Pillow==8.0.1
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.6
scipy==1.4.1
six==1.15.0
tensorboard==2.6.0
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-gpu==2.6.0
tensorflow-estimator==2.6.0
termcolor==1.1.0
tqdm==4.54.1
typing-extensions==3.7.4.3
urllib3==1.26.2
Werkzeug==1.0.1
wrapt==1.12.1
zipp==3.4.0