Open monajalal opened 8 months ago
Use this command to install TensorFlow in the Python 3.8 environment pip install nvidia-tensorflow==1.15.4
There is an issue with your algorithm environment
@Jingranxia
your command didn't work. How did you create the environment?
(base) mona@ada:~/EfficientPose$ conda create --name effpose python=3.8
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 23.7.4
latest version: 23.10.0
Please update conda by running
$ conda update -n base -c defaults conda
Or to minimize the number of packages updated during conda update use
conda install conda=23.10.0
## Package Plan ##
environment location: /home/mona/anaconda3/envs/effpose
added / updated specs:
- python=3.8
The following packages will be downloaded:
package | build
---------------------------|-----------------
pip-23.3.1 | py38h06a4308_0 2.6 MB
python-3.8.18 | h955ad1f_0 25.3 MB
setuptools-68.0.0 | py38h06a4308_0 927 KB
wheel-0.41.2 | py38h06a4308_0 108 KB
------------------------------------------------------------
Total: 28.9 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
ca-certificates pkgs/main/linux-64::ca-certificates-2023.08.22-h06a4308_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_0
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl pkgs/main/linux-64::openssl-3.0.12-h7f8727e_0
pip pkgs/main/linux-64::pip-23.3.1-py38h06a4308_0
python pkgs/main/linux-64::python-3.8.18-h955ad1f_0
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-68.0.0-py38h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0
tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0
wheel pkgs/main/linux-64::wheel-0.41.2-py38h06a4308_0
xz pkgs/main/linux-64::xz-5.4.2-h5eee18b_0
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate effpose
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) mona@ada:~/EfficientPose$ conda activate effpose
(effpose) mona@ada:~/EfficientPose$ pip install nvidia-tensorflow==1.15.4
ERROR: Could not find a version that satisfies the requirement nvidia-tensorflow==1.15.4 (from versions: 0.0.1.dev4, 0.0.1.dev5)
ERROR: No matching distribution found for nvidia-tensorflow==1.15.4
this is what bard says:
The error message indicates that the package nvidia-tensorflow==1.15.4 is not available for your current version of Python (3.8). To fix this, you can either install a different version of TensorFlow that is compatible with Python 3.8, or you can downgrade your version of Python to 3.6, which is the version that nvidia-tensorflow==1.15.4 was built for.
even with Python 3.6 I couldn't install that version of tensorflow you mentioned @Jingranxia
(base) mona@ada:~/EfficientPose$ conda create --name effpose python=3.6
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 23.7.4
latest version: 23.10.0
Please update conda by running
$ conda update -n base -c defaults conda
Or to minimize the number of packages updated during conda update use
conda install conda=23.10.0
## Package Plan ##
environment location: /home/mona/anaconda3/envs/effpose
added / updated specs:
- python=3.6
The following packages will be downloaded:
package | build
---------------------------|-----------------
python-3.6.13 | h12debd9_1 32.5 MB
------------------------------------------------------------
Total: 32.5 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
ca-certificates pkgs/main/linux-64::ca-certificates-2023.08.22-h06a4308_0
certifi pkgs/main/linux-64::certifi-2021.5.30-py36h06a4308_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
libffi pkgs/main/linux-64::libffi-3.3-he6710b0_2
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl pkgs/main/linux-64::openssl-1.1.1w-h7f8727e_0
pip pkgs/main/linux-64::pip-21.2.2-py36h06a4308_0
python pkgs/main/linux-64::python-3.6.13-h12debd9_1
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-58.0.4-py36h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0
tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0
wheel pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
xz pkgs/main/linux-64::xz-5.4.2-h5eee18b_0
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate effpose
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) mona@ada:~/EfficientPose$ conda activate effpose
(effpose) mona@ada:~/EfficientPose$ pip install nvidia-tensorflow==1.15.4
ERROR: Could not find a version that satisfies the requirement nvidia-tensorflow==1.15.4 (from versions: 0.0.1.dev4, 0.0.1.dev5)
ERROR: No matching distribution found for nvidia-tensorflow==1.15.4
(effpose) mona@ada:~/EfficientPose$ python
Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
even with Python 3.6 I couldn't install that version of tensorflow you mentioned 即使使用 Python 3.6,我也无法安装您提到的 tensorflow 版本@Jingranxia
(base) mona@ada:~/EfficientPose$ conda create --name effpose python=3.6 Collecting package metadata (current_repodata.json): done Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 23.7.4 latest version: 23.10.0 Please update conda by running $ conda update -n base -c defaults conda Or to minimize the number of packages updated during conda update use conda install conda=23.10.0 ## Package Plan ## environment location: /home/mona/anaconda3/envs/effpose added / updated specs: - python=3.6 The following packages will be downloaded: package | build ---------------------------|----------------- python-3.6.13 | h12debd9_1 32.5 MB ------------------------------------------------------------ Total: 32.5 MB The following NEW packages will be INSTALLED: _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu ca-certificates pkgs/main/linux-64::ca-certificates-2023.08.22-h06a4308_0 certifi pkgs/main/linux-64::certifi-2021.5.30-py36h06a4308_0 ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1 libffi pkgs/main/linux-64::libffi-3.3-he6710b0_2 libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 openssl pkgs/main/linux-64::openssl-1.1.1w-h7f8727e_0 pip pkgs/main/linux-64::pip-21.2.2-py36h06a4308_0 python pkgs/main/linux-64::python-3.6.13-h12debd9_1 readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 setuptools pkgs/main/linux-64::setuptools-58.0.4-py36h06a4308_0 sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0 tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 wheel pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 xz pkgs/main/linux-64::xz-5.4.2-h5eee18b_0 zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 Proceed ([y]/n)? y Downloading and Extracting Packages Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate effpose # # To deactivate an active environment, use # # $ conda deactivate (base) mona@ada:~/EfficientPose$ conda activate effpose (effpose) mona@ada:~/EfficientPose$ pip install nvidia-tensorflow==1.15.4 ERROR: Could not find a version that satisfies the requirement nvidia-tensorflow==1.15.4 (from versions: 0.0.1.dev4, 0.0.1.dev5) ERROR: No matching distribution found for nvidia-tensorflow==1.15.4 (effpose) mona@ada:~/EfficientPose$ python Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
hello, pip install nvidia-pyindex before
even with Python 3.6 I couldn't install that version of tensorflow you mentioned 即使使用 Python 3.6,我也无法安装您提到的 tensorflow 版本@Jingranxia
(base) mona@ada:~/EfficientPose$ conda create --name effpose python=3.6 Collecting package metadata (current_repodata.json): done Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 23.7.4 latest version: 23.10.0 Please update conda by running $ conda update -n base -c defaults conda Or to minimize the number of packages updated during conda update use conda install conda=23.10.0 ## Package Plan ## environment location: /home/mona/anaconda3/envs/effpose added / updated specs: - python=3.6 The following packages will be downloaded: package | build ---------------------------|----------------- python-3.6.13 | h12debd9_1 32.5 MB ------------------------------------------------------------ Total: 32.5 MB The following NEW packages will be INSTALLED: _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu ca-certificates pkgs/main/linux-64::ca-certificates-2023.08.22-h06a4308_0 certifi pkgs/main/linux-64::certifi-2021.5.30-py36h06a4308_0 ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1 libffi pkgs/main/linux-64::libffi-3.3-he6710b0_2 libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 openssl pkgs/main/linux-64::openssl-1.1.1w-h7f8727e_0 pip pkgs/main/linux-64::pip-21.2.2-py36h06a4308_0 python pkgs/main/linux-64::python-3.6.13-h12debd9_1 readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 setuptools pkgs/main/linux-64::setuptools-58.0.4-py36h06a4308_0 sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0 tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 wheel pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 xz pkgs/main/linux-64::xz-5.4.2-h5eee18b_0 zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 Proceed ([y]/n)? y Downloading and Extracting Packages Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate effpose # # To deactivate an active environment, use # # $ conda deactivate (base) mona@ada:~/EfficientPose$ conda activate effpose (effpose) mona@ada:~/EfficientPose$ pip install nvidia-tensorflow==1.15.4 ERROR: Could not find a version that satisfies the requirement nvidia-tensorflow==1.15.4 (from versions: 0.0.1.dev4, 0.0.1.dev5) ERROR: No matching distribution found for nvidia-tensorflow==1.15.4 (effpose) mona@ada:~/EfficientPose$ python Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
and you should use python3.8 ,not use 3.6 ,because the python 3.6 add-on package is not automatically installed
@Jingranxia thank you that helped me but I get this error. How did you fix this?
(EfficientPose) mona@ada:~/EfficientPose$ python train.py --phi 0 --weights weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5 linemod data/Linemod_preprocessed/ --object-id 8
2023-11-29 13:13:42.445715: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From train.py:204: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From train.py:206: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2023-11-29 13:13:43.505241: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3096000000 Hz
2023-11-29 13:13:43.510439: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x434bf50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-11-29 13:13:43.510479: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-11-29 13:13:43.512905: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-11-29 13:13:43.584707: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4321be0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-11-29 13:13:43.584796: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX 6000 Ada Generation, Compute Capability 8.9
2023-11-29 13:13:43.585722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: NVIDIA RTX 6000 Ada Generation major: 8 minor: 9 memoryClockRate(GHz): 2.505
pciBusID: 0000:52:00.0
2023-11-29 13:13:43.585768: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-11-29 13:13:43.613692: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-11-29 13:13:43.617581: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-11-29 13:13:43.617997: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-11-29 13:13:43.618652: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-11-29 13:13:43.619621: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-11-29 13:13:43.619835: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-11-29 13:13:43.620187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2023-11-29 13:13:43.620216: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-11-29 13:13:43.625597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-11-29 13:13:43.625632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2023-11-29 13:13:43.625649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2023-11-29 13:13:43.626098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 39203 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX 6000 Ada Generation, pci bus id: 0000:52:00.0, compute capability: 8.9)
{'dataset_type': 'linemod', 'rotation_representation': 'axis_angle', 'weights': 'weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5', 'freeze_backbone': False, 'no_freeze_bn': False, 'batch_size': 1, 'lr': 0.0001, 'no_color_augmentation': False, 'no_6dof_augmentation': False, 'phi': 0, 'gpu': None, 'epochs': 500, 'steps': 1790, 'snapshot_path': 'checkpoints/29_11_2023_13_13_43', 'tensorboard_dir': 'logs/29_11_2023_13_13_43', 'snapshots': True, 'evaluation': True, 'compute_val_loss': False, 'score_threshold': 0.5, 'validation_image_save_path': None, 'multiprocessing': False, 'workers': 4, 'max_queue_size': 10, 'linemod_path': 'data/Linemod_preprocessed/', 'object_id': 8}
Creating the Generators...
Done!
Building the Model...
Traceback (most recent call last):
File "train.py", line 368, in <module>
main()
File "train.py", line 132, in main
model, prediction_model, all_layers = build_EfficientPose(args.phi,
File "/home/mona/EfficientPose/model.py", line 99, in build_EfficientPose
image_input = layers.Input(input_shape)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/input_layer.py", line 265, in Input
input_layer = InputLayer(**input_layer_config)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/input_layer.py", line 121, in __init__
input_tensor = backend.placeholder(
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/keras/backend.py", line 1051, in placeholder
x = array_ops.placeholder(dtype, shape=shape, name=name)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/ops/array_ops.py", line 2619, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 6668, in placeholder
_, _, _op = _op_def_lib._apply_op_helper(
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py", line 3411, in _create_op_internal
node_def = _NodeDef(op_type, name, device=None, attrs=attrs)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py", line 1552, in _NodeDef
node_def.attr[k].CopyFrom(v)
File "/home/mona/anaconda3/envs/EfficientPose/lib/python3.8/site-packages/google/protobuf/internal/containers.py", line 70, in __getitem__
return self._values[key]
TypeError: list indices must be integers or slices, not str
Here is my environment.yml file:
(EfficientPose) mona@ada:~/EfficientPose$ cat environment.yml
name: EfficientPose
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- ncurses=6.4=h6a678d5_0
- openssl=3.0.12=h7f8727e_0
- pip=23.3.1=py38h06a4308_0
- python=3.8.18=h955ad1f_0
- pyyaml=6.0.1=py38h5eee18b_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py38h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py38h06a4308_0
- xz=5.4.2=h5eee18b_0
- yaml=0.2.5=h7b6447c_0
- zlib=1.2.13=h5eee18b_0
- pip:
- absl-py==2.0.0
- astor==0.8.1
- contourpy==1.1.1
- cycler==0.12.1
- cython==3.0.6
- fonttools==4.45.1
- gast==0.2.2
- google-pasta==0.2.0
- grpcio==1.59.3
- h5py==3.10.0
- imageio==2.33.0
- imgaug==0.4.0
- importlib-metadata==6.8.0
- importlib-resources==6.1.1
- keras-applications==1.0.8
- keras-preprocessing==1.1.2
- kiwisolver==1.4.5
- lazy-loader==0.3
- markdown==3.5.1
- markupsafe==2.1.3
- matplotlib==3.7.4
- networkx==3.1
- numpy==1.24.4
- nvidia-cublas==11.3.0.106
- nvidia-cuda-cupti==11.1.105
- nvidia-cuda-nvcc==11.1.105
- nvidia-cuda-nvrtc==11.1.105
- nvidia-cuda-runtime==11.1.74
- nvidia-cudnn==8.0.5.43
- nvidia-cufft==10.3.0.105
- nvidia-curand==10.2.2.105
- nvidia-cusolver==11.0.1.105
- nvidia-cusparse==11.3.0.10
- nvidia-dali-cuda110==0.28.0
- nvidia-dali-nvtf-plugin==0.28.0+nv20.12
- nvidia-nccl==2.8.3
- nvidia-pyindex==1.0.9
- nvidia-tensorboard==1.15.0+nv20.12
- nvidia-tensorflow==1.15.4+nv20.12
- nvidia-tensorrt==7.2.2.1
- opencv-python==4.8.1.78
- opt-einsum==3.3.0
- packaging==23.2
- pillow==10.1.0
- plyfile==1.0.2
- protobuf==4.25.1
- pyparsing==3.1.1
- python-dateutil==2.8.2
- pywavelets==1.4.1
- scikit-image==0.21.0
- scipy==1.10.1
- shapely==2.0.2
- six==1.16.0
- tensorboard==1.15.0
- tensorflow-estimator==1.15.1
- termcolor==2.3.0
- tifffile==2023.7.10
- typeguard==4.1.5
- typing-extensions==4.8.0
- webencodings==0.5.1
- werkzeug==3.0.1
- wrapt==1.16.0
- zipp==3.17.0
prefix: /home/mona/anaconda3/envs/EfficientPose
Here's my sys info:
(EfficientPose) mona@ada:~$ uname -a
Linux ada 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
(EfficientPose) mona@ada:~$ lsb_release -a
LSB Version: core-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
(EfficientPose) mona@ada:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(EfficientPose) mona@ada:~$ nvidia-smi
Wed Nov 29 13:20:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX 6000 Ada Gene... On | 00000000:52:00.0 On | Off |
| 32% 61C P2 76W / 300W | 7489MiB / 49140MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2317 G /usr/lib/xorg/Xorg 740MiB |
| 0 N/A N/A 2519 G /usr/bin/gnome-shell 61MiB |
| 0 N/A N/A 2994 G ...AAAAAAAACAAAAAAAAAA= --shared-files 98MiB |
| 0 N/A N/A 25264 G ...0208189,17325718055376231948,262144 60MiB |
| 0 N/A N/A 652962 G ...irefox/3358/usr/lib/firefox/firefox 422MiB |
| 0 N/A N/A 703622 G blender 205MiB |
| 0 N/A N/A 829624 G /usr/bin/gnome-control-center 79MiB |
| 0 N/A N/A 837524 C python 844MiB |
| 0 N/A N/A 842408 G ...sion,SpareRendererForSitePerProcess 106MiB |
| 0 N/A N/A 847224 C python 1046MiB |
| 0 N/A N/A 855952 C python 984MiB |
| 0 N/A N/A 856952 C python 914MiB |
| 0 N/A N/A 857675 C python 730MiB |
| 0 N/A N/A 1068492 G meshlab 12MiB |
| 0 N/A N/A 1118791 C python 1046MiB |
+---------------------------------------------------------------------------------------+
Please let me know if you may need more information
I also had the same issue where the training only gave NAN values and I could see in the task manager that my GPU wasnt been used during the training... I figured out that CUDA 10.0 was not supported by my GPU. See this graph
I had the RTX3070 which uses the Ampere architecture. I switched now to a GTX1070 ti which uses the Pascal architecture and now it works fine.
One option you have if you cant get hands on a suitable GPU is to use the CPU. But it is significantly slower. Just type the following commands
pip install tensorflow-cpu==1.15 pip install h5py==2.10.0 --force-reinstall pip install numpy==1.19.5
@madhanuman thanks a lot for your response. I ran with CPU and the versions you suggested above. Do the following sound correct to you? I still have some nans
(EfficientPose) mona@ada:~/effpose/EfficientPose$ python evaluate.py --phi 0 --weights weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5 --validation-image-save-path val_imgs linemod data/Linemod_preprocessed/ --object-id 8
WARNING:tensorflow:From evaluate.py:132: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From evaluate.py:134: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2024-02-21 15:40:46.017484: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2024-02-21 15:40:46.023920: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3096000000 Hz
2024-02-21 15:40:46.025134: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e9cac0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-02-21 15:40:46.025158: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2024-02-21 15:40:46.026767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2024-02-21 15:40:46.117532: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20b98e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-02-21 15:40:46.117555: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX 6000 Ada Generation, Compute Capability 8.9
2024-02-21 15:40:46.117876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA RTX 6000 Ada Generation major: 8 minor: 9 memoryClockRate(GHz): 2.505
pciBusID: 0000:52:00.0
2024-02-21 15:40:46.118055: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:40:46.119033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2024-02-21 15:40:46.120008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2024-02-21 15:40:46.120232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2024-02-21 15:40:46.121344: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2024-02-21 15:40:46.122185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2024-02-21 15:40:46.124973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2024-02-21 15:40:46.125178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2024-02-21 15:40:46.125203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:40:46.125365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-02-21 15:40:46.125371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2024-02-21 15:40:46.125375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2024-02-21 15:40:46.125539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 27811 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX 6000 Ada Generation, pci bus id: 0000:52:00.0, compute capability: 8.9)
{'dataset_type': 'linemod', 'rotation_representation': 'axis_angle', 'weights': 'weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5', 'batch_size': 1, 'phi': 0, 'gpu': None, 'score_threshold': 0.5, 'validation_image_save_path': 'val_imgs', 'linemod_path': 'data/Linemod_preprocessed/', 'object_id': 8}
Creating the Generators...
Done!
Building the Model...
input shape is: (512, 512, 3)
ArgSpec(args=['shape', 'batch_size', 'name', 'dtype', 'sparse', 'tensor', 'ragged'], varargs=None, keywords='kwargs', defaults=(None, None, None, None, False, None, False))
WARNING:tensorflow:From /home/mona/anaconda3/envs/EfficientPose/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py:507: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with distribution=normal is deprecated and will be removed in a future version.
Instructions for updating:
`normal` is a deprecated alias for `truncated_normal`
WARNING:tensorflow:From /home/mona/anaconda3/envs/EfficientPose/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2024-02-21 15:40:57.013591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA RTX 6000 Ada Generation major: 8 minor: 9 memoryClockRate(GHz): 2.505
pciBusID: 0000:52:00.0
2024-02-21 15:40:57.013646: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:40:57.013653: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2024-02-21 15:40:57.013658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2024-02-21 15:40:57.013663: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2024-02-21 15:40:57.013668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2024-02-21 15:40:57.013673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2024-02-21 15:40:57.013678: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2024-02-21 15:40:57.013816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2024-02-21 15:40:57.014131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA RTX 6000 Ada Generation major: 8 minor: 9 memoryClockRate(GHz): 2.505
pciBusID: 0000:52:00.0
2024-02-21 15:40:57.014143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:40:57.014150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2024-02-21 15:40:57.014157: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2024-02-21 15:40:57.014162: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2024-02-21 15:40:57.014167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2024-02-21 15:40:57.014173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2024-02-21 15:40:57.014177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2024-02-21 15:40:57.014281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2024-02-21 15:40:57.014301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-02-21 15:40:57.014305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2024-02-21 15:40:57.014307: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2024-02-21 15:40:57.014434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 27811 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX 6000 Ada Generation, pci bus id: 0000:52:00.0, compute capability: 8.9)
WARNING:tensorflow:From /home/mona/effpose/EfficientPose/layers.py:298: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Done!
Loading model, this may take a second...
Done!
Running network: 0% (0 of 1009) | | Elapsed Time: 0:00:00 ETA: --:--:--2024-02-21 15:45:35.072567: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2024-02-21 15:49:12.814151: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Not found: ./bin/ptxas not found
Relying on driver to perform ptx compilation. This message will be only logged once.
2024-02-21 15:49:12.885611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Running network: 100% (1009 of 1009) |###############################################################################################################################| Elapsed Time: 0:04:49 Time: 0:04:49
Parsing annotations: 100% (1009 of 1009) |###########################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1009/1009 [00:07<00:00, 137.57it/s]
/home/mona/anaconda3/envs/EfficientPose/lib/python3.7/site-packages/numpy/core/_methods.py:193: RuntimeWarning: invalid value encountered in subtract
x = asanyarray(arr - arrmean)
1009 instances of class object with average precision: 0.0000
1009 instances of class object with ADD accuracy: 0.0000
1009 instances of class object with ADD-S-Accuracy: 0.0000
1009 instances of class object with 5cm-5degree-Accuracy: 0.0000
class object with Translation Differences in mm: Mean: 3187475330973234436726203613184.0000 and Std: 7807679963186587234277502484480.0000
class object with Rotation Differences in degree: Mean: 144.6458 and Std: 17.1566
1009 instances of class object with 2d-projection-Accuracy: 0.0000
1009 instances of class object with ADD(-S)-Accuracy: 0.0000
class object with Transformed Point Distances in mm: Mean: 3187475330973234436726203613184.0000 and Std: 7807679963186587234277502484480.0000
class object with Transformed Symmetric Point Distances in mm: Mean: inf and Std: nan
class object with Mixed Transformed Point Distances in mm: Mean: 3187475330973234436726203613184.0000 and Std: 7807679963186587234277502484480.0000
mAP: 0.0000
ADD: 0.0000
ADD-S: 0.0000
5cm_5degree: 0.0000
TranslationErrorMean_in_mm: 3187475330973234436726203613184.0000
TranslationErrorStd_in_mm: 7807679963186587234277502484480.0000
RotationErrorMean_in_degree: 144.6458
RotationErrorStd_in_degree: 17.1566
2D-Projection: 0.0000
Summed_Translation_Rotation_Error: 10995155294159821671003706097664.0000
ADD(-S): 0.0000
AveragePointDistanceMean_in_mm: 3187475330973234436726203613184.0000
AveragePointDistanceStd_in_mm: 7807679963186587234277502484480.0000
AverageSymmetricPointDistanceMean_in_mm: inf
AverageSymmetricPointDistanceStd_in_mm: nan
MixedAveragePointDistanceMean_in_mm: 3187475330973234436726203613184.0000
MixedAveragePointDistanceStd_in_mm: 7807679963186587234277502484480.0000
@madhanuman
also training using tensorflow cpu yields nan loss
(EfficientPose) mona@ada:~/effpose/EfficientPose$ python train.py --phi 0 --weights weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5 linemod data/Linemod_preprocessed/ --object-id 8
WARNING:tensorflow:From train.py:204: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From train.py:206: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2024-02-21 15:54:47.492224: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2024-02-21 15:54:47.498868: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3096000000 Hz
2024-02-21 15:54:47.500513: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2fd8c10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-02-21 15:54:47.500538: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2024-02-21 15:54:47.502165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2024-02-21 15:54:47.590605: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xc1e7f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-02-21 15:54:47.590664: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX 6000 Ada Generation, Compute Capability 8.9
2024-02-21 15:54:47.591259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA RTX 6000 Ada Generation major: 8 minor: 9 memoryClockRate(GHz): 2.505
pciBusID: 0000:52:00.0
2024-02-21 15:54:47.591777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:54:47.593401: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2024-02-21 15:54:47.594326: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2024-02-21 15:54:47.594543: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2024-02-21 15:54:47.595626: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2024-02-21 15:54:47.596433: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2024-02-21 15:54:47.599096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2024-02-21 15:54:47.599317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2024-02-21 15:54:47.599349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2024-02-21 15:54:47.599518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-02-21 15:54:47.599524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2024-02-21 15:54:47.599528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2024-02-21 15:54:47.599706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 27792 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX 6000 Ada Generation, pci bus id: 0000:52:00.0, compute capability: 8.9)
{'dataset_type': 'linemod', 'rotation_representation': 'axis_angle', 'weights': 'weights/Weights/Linemod/object_8/phi_0_linemod_best_ADD.h5', 'freeze_backbone': False, 'no_freeze_bn': False, 'batch_size': 1, 'lr': 0.0001, 'no_color_augmentation': False, 'no_6dof_augmentation': False, 'phi': 0, 'gpu': None, 'epochs': 500, 'steps': 1790, 'snapshot_path': 'checkpoints/21_02_2024_15_54_47', 'tensorboard_dir': 'logs/21_02_2024_15_54_47', 'snapshots': True, 'evaluation': True, 'compute_val_loss': False, 'score_threshold': 0.5, 'validation_image_save_path': None, 'multiprocessing': False, 'workers': 4, 'max_queue_size': 10, 'linemod_path': 'data/Linemod_preprocessed/', 'object_id': 8}
Creating the Generators...
Done!
@monajalal It seems like something is not loading correctly... In the screenshot you see that it states at the bottom that you have an CUPTI error I also encountered something similar... If i remember correctly you need to change something in the environment variables path
I am getting NAN loss, using your own processed data, conda env, and command. Is there a fix to it?