GPU training doesn't work?

yongshuo-Z commented 3 years ago

Hi, thanks for your nice code.

When I'm training the model, it trains on cpu, not gpu, which makes the training quite slow.

I've installed tensorflow-gpu 1.14.0 and keras 2.2.5. And the environment works fine with other project (other projects can train on gpu). I wonder is there any configuration we need to set explicitly to make gpu work? Thanks!

Tokariew commented 2 years ago

name: n2vv2
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - abseil-cpp=20210324.2=h9c3ff4c_0
  - absl-py=0.15.0=pyhd8ed1ab_0
  - aiohttp=3.7.4.post0=py39h3811e60_1
  - argon2-cffi=21.1.0=py39h3811e60_2
  - astunparse=1.6.3=pyhd8ed1ab_0
  - async-timeout=3.0.1=py_1000
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - bleach=4.1.0=pyhd8ed1ab_0
  - blinker=1.4=py_1
  - brotlipy=0.7.0=py39h3811e60_1003
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2021.10.8=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cachetools=4.2.4=pyhd8ed1ab_0
  - certifi=2021.10.8=py39hf3d152e_1
  - cffi=1.15.0=py39h4bc2ebd_0
  - chardet=4.0.0=py39hf3d152e_2
  - click=8.0.3=py39hf3d152e_1
  - cryptography=35.0.0=py39h95dcef6_2
  - cudatoolkit=11.3.1=ha36c431_9
  - cudnn=8.2.1.32=h86fa8c9_0
  - cupti=11.3.1=0
  - dataclasses=0.8=pyhc8e2a94_3
  - debugpy=1.5.1=py39he80948d_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.3=pyhd8ed1ab_1003
  - gast=0.4.0=pyh9f0ad1d_0
  - giflib=5.2.1=h36c2ea0_2
  - google-auth=1.35.0=pyh6c4a22f_0
  - google-auth-oauthlib=0.4.6=pyhd8ed1ab_0
  - google-pasta=0.2.0=pyh8c360ce_0
  - grpc-cpp=1.39.1=h850795e_1
  - grpcio=1.39.0=py39hff7568b_0
  - h5py=3.1.0=nompi_py39h25020de_100
  - hdf5=1.10.6=nompi_h6a2412b_1114
  - icu=68.2=h9c3ff4c_0
  - idna=2.10=pyh9f0ad1d_0
  - importlib-metadata=4.8.1=py39hf3d152e_1
  - importlib_resources=5.4.0=pyhd8ed1ab_0
  - ipykernel=6.4.2=py39hef51801_0
  - ipython=7.29.0=py39hef51801_1
  - ipython_genutils=0.2.0=py_1
  - jedi=0.18.0=py39hf3d152e_3
  - jinja2=3.0.2=pyhd8ed1ab_0
  - jpeg=9d=h36c2ea0_0
  - jsonschema=4.2.1=pyhd8ed1ab_0
  - jupyter_client=7.0.6=pyhd8ed1ab_0
  - jupyter_core=4.9.1=py39hf3d152e_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - keras=2.6.0=pyhd8ed1ab_0
  - keras-preprocessing=1.1.2=pyhd8ed1ab_0
  - krb5=1.19.2=hcc1bbae_3
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libblas=3.9.0=12_linux64_openblas
  - libcblas=3.9.0=12_linux64_openblas
  - libcurl=7.79.1=h2574ce0_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h9c3ff4c_4
  - libgcc-ng=11.2.0=h1d223b6_11
  - libgfortran-ng=11.2.0=h69a702a_11
  - libgfortran5=11.2.0=h5c6108e_11
  - libgomp=11.2.0=h1d223b6_11
  - liblapack=3.9.0=12_linux64_openblas
  - libnghttp2=1.43.0=h812cca2_1
  - libopenblas=0.3.18=pthreads_h8fe5266_0
  - libpng=1.6.37=h21135ba_2
  - libprotobuf=3.16.0=h780b84a_0
  - libsodium=1.0.18=h36c2ea0_1
  - libssh2=1.10.0=ha56f1ee_2
  - libstdcxx-ng=11.2.0=he4da1e4_11
  - libzlib=1.2.11=h36c2ea0_1013
  - markdown=3.3.4=pyhd8ed1ab_0
  - markupsafe=2.0.1=py39h3811e60_1
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mistune=0.8.4=py39h3811e60_1005
  - multidict=5.2.0=py39h3811e60_1
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.2.0=py39hf3d152e_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - nccl=2.11.4.1=hdc17891_0
  - ncurses=6.2=h58526e2_4
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.5=pyha770c72_0
  - numpy=1.19.5=py39hdbf815f_2
  - oauthlib=3.1.1=pyhd8ed1ab_0
  - openssl=1.1.1l=h7f98852_0
  - opt_einsum=3.3.0=pyhd8ed1ab_1
  - packaging=21.0=pyhd8ed1ab_0
  - pandoc=2.16.1=h7f98852_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.2=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=21.3.1=pyhd8ed1ab_0
  - prometheus_client=0.12.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.22=pyha770c72_0
  - protobuf=3.16.0=py39he80948d_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pyjwt=2.3.0=pyhd8ed1ab_0
  - pyopenssl=21.0.0=pyhd8ed1ab_0
  - pyparsing=3.0.5=pyhd8ed1ab_0
  - pyrsistent=0.18.0=py39h3811e60_0
  - pysocks=1.7.1=py39hf3d152e_4
  - python=3.9.7=hb7a2778_3_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-flatbuffers=1.12=pyhd8ed1ab_1
  - python_abi=3.9=2_cp39
  - pyu2f=0.1.5=pyhd8ed1ab_0
  - pyzmq=22.3.0=py39h37b5a0c_1
  - re2=2021.09.01=h9c3ff4c_0
  - readline=8.1=h46c0cb4_0
  - requests=2.25.1=pyhd3deb0d_0
  - requests-oauthlib=1.3.0=pyh9f0ad1d_0
  - rsa=4.7.2=pyh44b312d_0
  - scipy=1.7.1=py39hee8e79c_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=58.5.3=py39hf3d152e_0
  - six=1.15.0=pyh9f0ad1d_0
  - snappy=1.1.8=he1b5a44_3
  - sqlite=3.36.0=h9cd32fc_2
  - tensorboard=2.6.0=pyhd8ed1ab_1
  - tensorboard-data-server=0.6.0=py39h95dcef6_1
  - tensorboard-plugin-wit=1.8.0=pyh44b312d_0
  - tensorflow=2.6.0=cuda112py39h9dc3950_2
  - tensorflow-base=2.6.0=cuda112py39h0b4cdfd_2
  - tensorflow-estimator=2.6.0=cuda112py39heacc632_2
  - termcolor=1.1.0=py_2
  - terminado=0.12.1=py39hf3d152e_1
  - testpath=0.5.0=pyhd8ed1ab_0
  - tk=8.6.11=h27826a3_1
  - tornado=6.1=py39h3811e60_2
  - traitlets=5.1.1=pyhd8ed1ab_0
  - typing-extensions=3.7.4.3=0
  - typing_extensions=3.7.4.3=py_0
  - tzdata=2021e=he74cb21_0
  - urllib3=1.26.7=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - werkzeug=2.0.1=pyhd8ed1ab_0
  - wheel=0.37.0=pyhd8ed1ab_1
  - wrapt=1.12.1=py39h3811e60_3
  - xz=5.2.5=h516909a_1
  - yarl=1.7.2=py39h3811e60_1
  - zeromq=4.3.4=h9c3ff4c_1
  - zipp=3.6.0=pyhd8ed1ab_0
  - zlib=1.2.11=h36c2ea0_1013
  - pip:
    - csbdeep==0.6.3
    - cycler==0.11.0
    - imagecodecs==2021.8.26
    - kiwisolver==1.3.2
    - matplotlib==3.4.3
    - pillow==8.4.0
    - ruamel-yaml==0.17.17
    - ruamel-yaml-clib==0.2.6
    - tifffile==2021.11.2
    - tqdm==4.62.3
prefix: /home/tokariew/.local/share/conda/envs/n2vv2

with such conda environment GPU training is working for me on linux with nvidia GPU, hope it helps…

nv2 i installed from github and edited setup.py to bump version of keras

tibuch commented 2 years ago

The most recent N2V version requires TF2. Could you try this combination:

conda create -n n2v_env python=3.7
conda activate n2v_env
conda install cudatoolkit=10.1 cudnn
pip install tensorflow==2.3
pip install n2v
pip install jupyter

zxy126 commented 2 years ago

The most recent N2V version requires TF2. Could you try this combination:
conda create -n n2v_env python=3.7
conda activate n2v_env
conda install cudatoolkit=10.1 cudnn
pip install tensorflow==2.3
pip install n2v
pip install jupyter

And I add the "X:\anaconda3\envs\n2v_env\Library\bin" to the system path. It works very well on Win10.

Mrc010 commented 2 years ago

Got a new GPU and can only use super slow tensorflow==2.2 or slow tensorflow==1.15

conda create -n n2v python=3.7
conda install cudatoolkit=10.0 cudnn=7.6 tensorflow-estimator==1.15.1 keras==2.2.4 tensorflow-gpu==1.15 
pip install n2v==0.2.1

Edit: found a solution for CUDA 11.5 + Tensorflow 1.15 that is fast

conda create -n n2v python=3.8
conda activate n2v
pip install nvidia-pyindex
pip install nvidia-tensorflow
pip install nvidia-tensorboard
pip install n2v==0.2.1

cf. https://github.com/NVIDIA/tensorflow

sidenote: this is on Ubuntu 20.04

Edit 2: for Tensorflow 1.15., adding this to the notebook is useful to prevent annoying warnings and excessive memory allocation:

import tensorflow as tf
conf = tf.compat.v1.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.compat.v1.Session(config=conf)
tf.compat.v1.logging.set_verbosity('ERROR')

Wuito commented 1 year ago

The environment version I am using is TF2, on Win11 and Anaconda. python==3.9 tensorflow=2.7 CUDA=11.8 cuDNN=8.7 refer to the author's readme for other environment requirements

juglab / n2v

GPU training doesn't work? #108