rpautrat / SuperPoint

Efficient neural feature detector and descriptor
MIT License
1.92k stars 423 forks source link

Error encountered while running step1:loss nan, precision nan, recall 0.0000 #296

Closed 1z2213 closed 1 year ago

1z2213 commented 1 year ago

Dear author, Hello! I followed your instructions of https://github.com/rpautrat/SuperPoint/issues/173 and now I am trying to run the 1st step. python experiment.py train configs/magic-point_shapes.yaml magic-point_synth. I have the same problem with loss nan after extracting all syntetic shapes. I desperately need your help. Looking forward to your reply. Here are the instructions and all the information displayed when I run the code. sunlab@sunlab-ThinkStation-P520:~$ source ~/anaconda3/bin/activate (base) sunlab@sunlab-ThinkStation-P520:~$ conda create -n linenv python=3.6.3 Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done

==> WARNING: A newer version of conda exists. <== current version: 4.10.1 latest version: 23.5.0

Please update conda by running

$ conda update -n base -c defaults conda Package Plan environment location: /home/sunlab/anaconda3/envs/linenv

added / updated specs:

The following packages will be downloaded:

package build
pip-21.2.2 py36h06a4308_0 1.8 MB defaults
python-3.6.3 h6c0c0dc_5 25.5 MB defaults
setuptools-58.0.4 py36h06a4308_0 788 KB defaults

                                   Total:        28.1 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu ca-certificates pkgs/main/linux-64::ca-certificates-2023.05.30-h06a4308_0 certifi pkgs/main/linux-64::certifi-2021.5.30-py36h06a4308_0 libedit pkgs/main/linux-64::libedit-3.1.20221030-h5eee18b_0 libffi pkgs/main/linux-64::libffi-3.2.1-hf484d3e_1007 libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 openssl pkgs/main/linux-64::openssl-1.0.2u-h7b6447c_0 pip pkgs/main/linux-64::pip-21.2.2-py36h06a4308_0 python pkgs/main/linux-64::python-3.6.3-h6c0c0dc_5 readline pkgs/main/linux-64::readline-7.0-h7b6447c_5 setuptools pkgs/main/linux-64::setuptools-58.0.4-py36h06a4308_0 sqlite pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0 tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 wheel pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0 xz pkgs/main/linux-64::xz-5.4.2-h5eee18b_0 zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0

Proceed ([y]/n)? y

Downloading and Extracting Packages python-3.6.3 | 25.5 MB | ##################################### | 100% setuptools-58.0.4 | 788 KB | ##################################### | 100% pip-21.2.2 | 1.8 MB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done

To activate this environment, use $ conda activate linenv To deactivate an active environment, use $ conda deactivate (base) sunlab@sunlab-ThinkStation-P520:~$ conda activate linenv (linenv) sunlab@sunlab-ThinkStation-P520:~$ conda install tensorflow-gpu=1.13 Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: done

==> WARNING: A newer version of conda exists. <== current version: 4.10.1 latest version: 23.5.0

Please update conda by running

$ conda update -n base -c defaults conda Package Plan environment location: /home/sunlab/anaconda3/envs/linenv

added / updated specs:

The following NEW packages will be INSTALLED:

_tflow_select pkgs/main/linux-64::_tflow_select-2.1.0-gpu absl-py pkgs/main/noarch::absl-py-0.15.0-pyhd3eb1b0_0 astor pkgs/main/linux-64::astor-0.8.1-py36h06a4308_0 blas pkgs/main/linux-64::blas-1.0-mkl c-ares pkgs/main/linux-64::c-ares-1.19.0-h5eee18b_0 cudatoolkit pkgs/main/linux-64::cudatoolkit-10.0.130-0 cudnn pkgs/main/linux-64::cudnn-7.6.5-cuda10.0_0 cupti pkgs/main/linux-64::cupti-10.0.130-0 dataclasses pkgs/main/noarch::dataclasses-0.8-pyh4f3eec9_6 gast pkgs/main/noarch::gast-0.5.3-pyhd3eb1b0_0 grpcio pkgs/main/linux-64::grpcio-1.14.1-py36h9ba97e2_0 h5py pkgs/main/linux-64::h5py-2.10.0-py36hd6299e0_1 hdf5 pkgs/main/linux-64::hdf5-1.10.6-hb1b8bf9_0 intel-openmp pkgs/main/linux-64::intel-openmp-2022.1.0-h9e868ea_3769 keras-applications pkgs/main/noarch::keras-applications-1.0.8-py_1 keras-preprocessi~ pkgs/main/noarch::keras-preprocessing-1.1.2-pyhd3eb1b0_0 libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.5.0-ha8ba4b0_17 libgfortran4 pkgs/main/linux-64::libgfortran4-7.5.0-ha8ba4b0_17 libprotobuf pkgs/main/linux-64::libprotobuf-3.17.2-h4ff587b_1 markdown pkgs/main/linux-64::markdown-3.1.1-py36_0 mkl pkgs/main/linux-64::mkl-2020.2-256 mkl-service pkgs/main/linux-64::mkl-service-2.3.0-py36he8ac12f_0 mkl_fft pkgs/main/linux-64::mkl_fft-1.3.0-py36h54f3939_0 mkl_random pkgs/main/linux-64::mkl_random-1.1.1-py36h0573a6f_0 mock pkgs/main/noarch::mock-4.0.3-pyhd3eb1b0_0 numpy pkgs/main/linux-64::numpy-1.19.2-py36h54aff64_0 numpy-base pkgs/main/linux-64::numpy-base-1.19.2-py36hfa32c7d_0 protobuf pkgs/main/linux-64::protobuf-3.17.2-py36h295c915_0 scipy pkgs/main/linux-64::scipy-1.5.2-py36h0b6359f_0 six pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_1 tensorboard pkgs/main/linux-64::tensorboard-1.13.1-py36hf484d3e_0 tensorflow pkgs/main/linux-64::tensorflow-1.13.1-gpu_py36h3991807_0 tensorflow-base pkgs/main/linux-64::tensorflow-base-1.13.1-gpu_py36h8d69cac_0 tensorflow-estima~ pkgs/main/noarch::tensorflow-estimator-1.13.0-py_0 tensorflow-gpu pkgs/main/linux-64::tensorflow-gpu-1.13.1-h0d30ee6_0 termcolor pkgs/main/linux-64::termcolor-1.1.0-py36h06a4308_1 werkzeug pkgs/main/noarch::werkzeug-2.0.3-pyhd3eb1b0_0

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done (linenv) sunlab@sunlab-ThinkStation-P520:$ cd superpoint/SuperPoint-master (linenv) sunlab@sunlab-ThinkStation-P520:/superpoint/SuperPoint-master$ make install pip3 install -r requirements.txt Requirement already satisfied: numpy in /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (1.19.2) Requirement already satisfied: scipy in /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.5.2) Collecting opencv-python==3.4.2.16 Using cached opencv_python-3.4.2.16-cp36-cp36m-manylinux1_x86_64.whl (25.0 MB) Collecting opencv-contrib-python==3.4.2.16 Using cached opencv_contrib_python-3.4.2.16-cp36-cp36m-manylinux1_x86_64.whl (30.6 MB) Collecting tqdm Using cached tqdm-4.64.1-py2.py3-none-any.whl (78 kB) Collecting pyyaml Using cached PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (603 kB) Collecting flake8 Using cached flake8-5.0.4-py2.py3-none-any.whl (61 kB) Collecting jupyter Using cached jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB) Collecting importlib-resources Using cached importlib_resources-5.4.0-py3-none-any.whl (28 kB) Collecting mccabe<0.8.0,>=0.7.0 Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB) Collecting pycodestyle<2.10.0,>=2.9.0 Using cached pycodestyle-2.9.1-py2.py3-none-any.whl (41 kB) Collecting importlib-metadata<4.3,>=1.1.0 Using cached importlib_metadata-4.2.0-py3-none-any.whl (16 kB) Collecting pyflakes<2.6.0,>=2.5.0 Using cached pyflakes-2.5.0-py2.py3-none-any.whl (66 kB) Collecting ipykernel Using cached ipykernel-5.5.6-py3-none-any.whl (121 kB) Collecting nbconvert Using cached nbconvert-6.0.7-py3-none-any.whl (552 kB) Collecting ipywidgets Using cached ipywidgets-7.7.5-py2.py3-none-any.whl (123 kB) Collecting notebook Using cached notebook-6.4.10-py3-none-any.whl (9.9 MB) Collecting qtconsole Using cached qtconsole-5.2.2-py3-none-any.whl (120 kB) Collecting jupyter-console Using cached jupyter_console-6.4.3-py3-none-any.whl (22 kB) Collecting typing-extensions>=3.6.4 Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB) Collecting zipp>=0.5 Using cached zipp-3.6.0-py3-none-any.whl (5.3 kB) Collecting jupyter-client Using cached jupyter_client-7.1.2-py3-none-any.whl (130 kB) Collecting traitlets>=4.1.0 Using cached traitlets-4.3.3-py2.py3-none-any.whl (75 kB) Collecting ipython>=5.0.0 Using cached ipython-7.16.3-py3-none-any.whl (783 kB) Collecting tornado>=4.2 Using cached tornado-6.1-cp36-cp36m-manylinux2010_x86_64.whl (427 kB) Collecting ipython-genutils Using cached ipython_genutils-0.2.0-py2.py3-none-any.whl (26 kB) Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 Using cached prompt_toolkit-3.0.36-py3-none-any.whl (386 kB) Collecting pexpect Using cached pexpect-4.8.0-py2.py3-none-any.whl (59 kB) Collecting backcall Using cached backcall-0.2.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: setuptools>=18.5 in /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 8)) (58.0.4) Collecting pygments Using cached Pygments-2.14.0-py3-none-any.whl (1.1 MB) Collecting decorator Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB) Collecting jedi<=0.17.2,>=0.10 Using cached jedi-0.17.2-py2.py3-none-any.whl (1.4 MB) Collecting pickleshare Using cached pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB) Collecting parso<0.8.0,>=0.7.0 Using cached parso-0.7.1-py2.py3-none-any.whl (109 kB) Collecting wcwidth Using cached wcwidth-0.2.6-py2.py3-none-any.whl (29 kB) Requirement already satisfied: six in /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages (from traitlets>=4.1.0->ipykernel->jupyter->-r requirements.txt (line 8)) (1.16.0) Collecting jupyterlab-widgets<3,>=1.0.0 Using cached jupyterlab_widgets-1.1.4-py3-none-any.whl (246 kB) Collecting widgetsnbextension~=3.6.4 Using cached widgetsnbextension-3.6.4-py2.py3-none-any.whl (1.6 MB) Collecting prometheus-client Using cached prometheus_client-0.17.0-py3-none-any.whl (60 kB) Collecting Send2Trash>=1.8.0 Using cached Send2Trash-1.8.2-py3-none-any.whl (18 kB) Collecting jinja2 Using cached Jinja2-3.0.3-py3-none-any.whl (133 kB) Collecting argon2-cffi Using cached argon2_cffi-21.3.0-py3-none-any.whl (14 kB) Collecting nbformat Using cached nbformat-5.1.3-py3-none-any.whl (178 kB) Collecting nest-asyncio>=1.5 Using cached nest_asyncio-1.5.6-py3-none-any.whl (5.2 kB) Collecting pyzmq>=17 Using cached pyzmq-25.1.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB) Collecting terminado>=0.8.3 Using cached terminado-0.12.1-py3-none-any.whl (15 kB) Collecting jupyter-core>=4.6.1 Using cached jupyter_core-4.9.2-py3-none-any.whl (86 kB) Collecting python-dateutil>=2.1 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting entrypoints Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB) Collecting defusedxml Using cached defusedxml-0.7.1-py2.py3-none-any.whl (25 kB) Collecting pandocfilters>=1.4.1 Using cached pandocfilters-1.5.0-py2.py3-none-any.whl (8.7 kB) Collecting testpath Using cached testpath-0.6.0-py3-none-any.whl (83 kB) Collecting mistune<2,>=0.8.1 Using cached mistune-0.8.4-py2.py3-none-any.whl (16 kB) Collecting bleach Using cached bleach-4.1.0-py2.py3-none-any.whl (157 kB) Collecting nbclient<0.6.0,>=0.5.0 Using cached nbclient-0.5.9-py3-none-any.whl (69 kB) Collecting jupyterlab-pygments Using cached jupyterlab_pygments-0.1.2-py2.py3-none-any.whl (4.6 kB) Collecting MarkupSafe>=2.0 Using cached MarkupSafe-2.0.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (30 kB) Collecting async-generator Using cached async_generator-1.10-py3-none-any.whl (18 kB) Collecting jsonschema!=2.5.0,>=2.4 Using cached jsonschema-3.2.0-py2.py3-none-any.whl (56 kB) Collecting attrs>=17.4.0 Using cached attrs-22.2.0-py3-none-any.whl (60 kB) Collecting pyrsistent>=0.14.0 Using cached pyrsistent-0.18.0-cp36-cp36m-manylinux1_x86_64.whl (117 kB) Collecting ptyprocess Using cached ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB) Requirement already satisfied: dataclasses in /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages (from argon2-cffi->notebook->jupyter->-r requirements.txt (line 8)) (0.8) Collecting argon2-cffi-bindings Using cached argon2_cffi_bindings-21.2.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (86 kB) Collecting cffi>=1.0.1 Using cached cffi-1.15.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (402 kB) Collecting pycparser Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB) Collecting webencodings Using cached webencodings-0.5.1-py2.py3-none-any.whl (11 kB) Collecting packaging Using cached packaging-21.3-py3-none-any.whl (40 kB) Collecting pyparsing!=3.0.5,>=2.0.2 Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB) Collecting qtpy Using cached QtPy-2.0.1-py3-none-any.whl (65 kB) WARNING: The candidate selected for download or install is a yanked version: 'opencv-python' candidate (version 3.4.2.16 at https://files.pythonhosted.org/packages/fa/7d/5042b668a8ed41d2a80b8c172f5efcd572e3c046c75ae029407e19b7fc68/opencv_python-3.4.2.16-cp36-cp36m-manylinux1_x86_64.whl#sha256=d75f60baced5086300a19c8ba63e75d059e8dce333795ef02084b9be6ec61516 (from https://pypi.org/simple/opencv-python/)) Reason for being yanked: Release deprecated WARNING: The candidate selected for download or install is a yanked version: 'opencv-contrib-python' candidate (version 3.4.2.16 at https://files.pythonhosted.org/packages/08/f1/66330f4042c4fb3b2d77a159db8e8916d9cdecc29bc8c1f56bc7f8a9bec9/opencv_contrib_python-3.4.2.16-cp36-cp36m-manylinux1_x86_64.whl#sha256=8de56394a9a3cf8788559032c2139c622ffdc7e37c32215ec865b4e1cd2ca70d (from https://pypi.org/simple/opencv-contrib-python/)) Reason for being yanked: Release deprecated Installing collected packages: zipp, typing-extensions, ipython-genutils, decorator, traitlets, pyrsistent, importlib-metadata, attrs, wcwidth, tornado, pyzmq, python-dateutil, pyparsing, pycparser, ptyprocess, parso, nest-asyncio, jupyter-core, jsonschema, entrypoints, webencodings, pygments, prompt-toolkit, pickleshare, pexpect, packaging, nbformat, MarkupSafe, jupyter-client, jedi, cffi, backcall, async-generator, testpath, pandocfilters, nbclient, mistune, jupyterlab-pygments, jinja2, ipython, defusedxml, bleach, argon2-cffi-bindings, terminado, Send2Trash, prometheus-client, nbconvert, ipykernel, argon2-cffi, notebook, widgetsnbextension, qtpy, jupyterlab-widgets, qtconsole, pyflakes, pycodestyle, mccabe, jupyter-console, ipywidgets, importlib-resources, tqdm, pyyaml, opencv-python, opencv-contrib-python, jupyter, flake8 Successfully installed MarkupSafe-2.0.1 Send2Trash-1.8.2 argon2-cffi-21.3.0 argon2-cffi-bindings-21.2.0 async-generator-1.10 attrs-22.2.0 backcall-0.2.0 bleach-4.1.0 cffi-1.15.1 decorator-5.1.1 defusedxml-0.7.1 entrypoints-0.4 flake8-5.0.4 importlib-metadata-4.2.0 importlib-resources-5.4.0 ipykernel-5.5.6 ipython-7.16.3 ipython-genutils-0.2.0 ipywidgets-7.7.5 jedi-0.17.2 jinja2-3.0.3 jsonschema-3.2.0 jupyter-1.0.0 jupyter-client-7.1.2 jupyter-console-6.4.3 jupyter-core-4.9.2 jupyterlab-pygments-0.1.2 jupyterlab-widgets-1.1.4 mccabe-0.7.0 mistune-0.8.4 nbclient-0.5.9 nbconvert-6.0.7 nbformat-5.1.3 nest-asyncio-1.5.6 notebook-6.4.10 opencv-contrib-python-3.4.2.16 opencv-python-3.4.2.16 packaging-21.3 pandocfilters-1.5.0 parso-0.7.1 pexpect-4.8.0 pickleshare-0.7.5 prometheus-client-0.17.0 prompt-toolkit-3.0.36 ptyprocess-0.7.0 pycodestyle-2.9.1 pycparser-2.21 pyflakes-2.5.0 pygments-2.14.0 pyparsing-3.0.7 pyrsistent-0.18.0 python-dateutil-2.8.2 pyyaml-6.0 pyzmq-25.1.0 qtconsole-5.2.2 qtpy-2.0.1 terminado-0.12.1 testpath-0.6.0 tornado-6.1 tqdm-4.64.1 traitlets-4.3.3 typing-extensions-4.1.1 wcwidth-0.2.6 webencodings-0.5.1 widgetsnbextension-3.6.4 zipp-3.6.0 pip3 install -e . Obtaining file:///home/sunlab/superpoint/SuperPoint-master Installing collected packages: superpoint Running setup.py develop for superpoint Successfully installed superpoint-0.0 sh setup.sh Path of the directory where datasets are stored and read: /home/sunlab//superpoint/SuperPoint-master/DATA_DIR Path of the directory where experiments data (logs, checkpoints, configs) are written: /home/sunlab//superpoint/SuperPoint-master/EXPER_DIR (linenv) sunlab@sunlab-ThinkStation-P520:/superpoint/SuperPoint-master$ cd ./superpoint (linenv) sunlab@sunlab-ThinkStation-P520:/superpoint/SuperPoint-master/superpoint$ export TMPDIR=/tmp/ (linenv) sunlab@sunlab-ThinkStation-P520:~/superpoint/SuperPoint-master/superpoint$ export TF_FORCE_GPU_ALLOW_GROWTH=true (linenv) sunlab@sunlab-ThinkStation-P520:/superpoint/SuperPoint-master/superpoint$ python experiment.py train configs/magic-point_shapes.yaml magic-point_synth/home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) [07/06/2023 23:37:58 INFO] Running command TRAIN Traceback (most recent call last): File "experiment.py", line 160, in args.func(config, output_dir, args) File "experiment.py", line 97, in _cli_train train(config, config['train_iter'], output_dir, pretrained_dir) File "experiment.py", line 22, in train with _init_graph(config) as net: File "/home/sunlab/anaconda3/envs/linenv/lib/python3.6/contextlib.py", line 81, in enter return next(self.gen) File "experiment.py", line 68, in _init_graph n_gpus = get_num_gpus() File "experiment.py", line 62, in get_num_gpus return len(os.environ['CUDA_VISIBLE_DEVICES'].split(',')) File "/home/sunlab/anaconda3/envs/linenv/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'CUDA_VISIBLE_DEVICES' (linenv) sunlab@sunlab-ThinkStation-P520:/superpoint/SuperPoint-master/superpoint$ export CUDA_VISIBLE_DEVICES=0 (linenv) sunlab@sunlab-ThinkStation-P520:~/superpoint/SuperPoint-master/superpoint$ python experiment.py train configs/magic-point_shapes.yaml magic-point_synth/home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) [07/06/2023 23:38:09 INFO] Running command TRAIN [07/06/2023 23:38:09 INFO] Number of GPUs detected: 1 [07/06/2023 23:38:12 INFO] Extracting archive for primitive draw_lines. [07/06/2023 23:38:15 INFO] Extracting archive for primitive draw_polygon. [07/06/2023 23:38:19 INFO] Extracting archive for primitive draw_multiple_polygons. [07/06/2023 23:38:22 INFO] Extracting archive for primitive draw_ellipses. [07/06/2023 23:38:26 INFO] Extracting archive for primitive draw_star. [07/06/2023 23:38:30 INFO] Extracting archive for primitive draw_checkerboard. [07/06/2023 23:38:33 INFO] Extracting archive for primitive draw_stripes. [07/06/2023 23:38:36 INFO] Extracting archive for primitive draw_cube. [07/06/2023 23:38:40 INFO] Extracting archive for primitive gaussian_noise. [07/06/2023 23:38:45 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/datasets/synthetic_shapes.py:189: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape.

[07/06/2023 23:38:45 INFO] Caching data, fist access will take some time. [07/06/2023 23:38:45 WARNING] From /home/sunlab/anaconda3/envs/linenv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py:423: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. [07/06/2023 23:38:45 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/models/homographies.py:218: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. [07/06/2023 23:38:45 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/models/homographies.py:277: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. [07/06/2023 23:38:46 INFO] Caching data, fist access will take some time. [07/06/2023 23:38:46 INFO] Caching data, fist access will take some time. 2023-07-06 23:38:46.500189: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2023-07-06 23:38:46.688675: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2023-07-06 23:38:46.701881: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557760fccac0 executing computations on platform Host. Devices: 2023-07-06 23:38:46.702032: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2023-07-06 23:38:46.823689: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557760fce720 executing computations on platform CUDA. Devices: 2023-07-06 23:38:46.823750: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6 2023-07-06 23:38:46.823998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: NVIDIA GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837 pciBusID: 0000:65:00.0 totalMemory: 11.76GiB freeMemory: 10.53GiB 2023-07-06 23:38:46.824031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2023-07-06 23:38:46.845678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-07-06 23:38:46.845739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2023-07-06 23:38:46.845756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2023-07-06 23:38:46.845924: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2023-07-06 23:38:46.845989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:65:00.0, compute capability: 8.6) [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/models/backbones/vgg.py:10: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. [07/06/2023 23:38:47 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/models/backbones/vgg.py:14: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 WARNING] From /home/sunlab/superpoint/SuperPoint-master/superpoint/models/backbones/vgg.py:28: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:47 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. [07/06/2023 23:38:48 INFO] Scale of 0 disables regularizer. 2023-07-06 23:38:48.493726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2023-07-06 23:38:48.493769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-07-06 23:38:48.493774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2023-07-06 23:38:48.493778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2023-07-06 23:38:48.493822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:65:00.0, compute capability: 8.6) [07/06/2023 23:38:53 INFO] Start training [07/07/2023 00:01:33 INFO] Iter 0: loss 4.1784, precision 0.0006, recall 0.0583 /home/sunlab/superpoint/SuperPoint-master/superpoint/models/base_model.py:387: RuntimeWarning: Mean of empty slice metrics = {m: np.nanmean(metrics[m], axis=0) for m in metrics} [07/07/2023 00:01:45 INFO] Iter 1000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:01:56 INFO] Iter 2000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:02:08 INFO] Iter 3000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:02:20 INFO] Iter 4000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:02:32 INFO] Iter 5000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:02:44 INFO] Iter 6000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:02:56 INFO] Iter 7000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:03:08 INFO] Iter 8000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:03:20 INFO] Iter 9000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:03:32 INFO] Iter 10000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:03:44 INFO] Iter 11000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:03:56 INFO] Iter 12000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:04:08 INFO] Iter 13000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:04:20 INFO] Iter 14000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:04:32 INFO] Iter 15000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:04:44 INFO] Iter 16000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:04:55 INFO] Iter 17000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:05:07 INFO] Iter 18000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:05:19 INFO] Iter 19000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:05:31 INFO] Iter 20000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:05:42 INFO] Iter 21000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:05:54 INFO] Iter 22000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:06:06 INFO] Iter 23000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:06:17 INFO] Iter 24000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:06:29 INFO] Iter 25000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:06:41 INFO] Iter 26000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:06:52 INFO] Iter 27000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:07:04 INFO] Iter 28000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:07:15 INFO] Iter 29000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:07:27 INFO] Iter 30000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:07:39 INFO] Iter 31000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:07:54 INFO] Iter 32000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:08:10 INFO] Iter 33000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:08:41 INFO] Iter 34000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:09:14 INFO] Iter 35000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:09:47 INFO] Iter 36000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:10:15 INFO] Iter 37000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:10:32 INFO] Iter 38000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:10:50 INFO] Iter 39000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:11:06 INFO] Iter 40000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:11:23 INFO] Iter 41000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:11:39 INFO] Iter 42000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:11:57 INFO] Iter 43000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:12:14 INFO] Iter 44000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:12:31 INFO] Iter 45000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:12:49 INFO] Iter 46000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:13:06 INFO] Iter 47000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:13:23 INFO] Iter 48000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:13:41 INFO] Iter 49000: loss nan, precision nan, recall 0.0000 [07/07/2023 00:13:54 INFO] Training finished [07/07/2023 00:13:55 INFO] Saving checkpoint for iteration #50000 2023-07-07 00:13:56.578198: W tensorflow/core/kernels/data/cache_dataset_ops.cc:810] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the datasetwill be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead. (linenv) sunlab@sunlab-ThinkStation-P520:~/superpoint/SuperPoint-master/superpoint$ python experiment.py train configs/magic-point_shapes.yaml magic-point_synth

jehovahlbf commented 1 year ago

I have met the same question, and I may have found the reason. If you're using a RTX30X0 series graphics card,it is only supported by >cuda 11 because of sm86. But cuda11 is binded with tensorflow2. (there is a version of tensorflow 1.15 is available for cuda 11, you can have a try ) I also tried tensorflow-gpu 1.14, for some unknown reason, i passed step1 but failed with step2 ,for my tensorflow can find both cuda and cudnn but get Error with GPU: CUBLAS_STATUS_EXECUTION_FAILED. At last i get to the pytorch version superpoint, and met the same problem with cuda version that cuda10 only support up to sm75, then i figured out this might be the reason why I failed here. If you solve this problem please let me know.

1z2213 commented 1 year ago

Yes, I can confirm that it's the RTX30 series cards, I tried all versions of tensorflow in RTX30 series cards and they all failed. I ran it on another computer with an RTX30 series graphics card and it failed as well. But the pytorch version superpoint is OK. I've made it.

------------------ 原始邮件 ------------------ 发件人: "rpautrat/SuperPoint" @.>; 发送时间: 2023年9月14日(星期四) 晚上10:33 @.>; 抄送: "李林 @.**@.>; 主题: Re: [rpautrat/SuperPoint] Error encountered while running step1:loss nan, precision nan, recall 0.0000 (Issue #296)

I have met the same question, and I may have found the reason. If you're using a RTX30X0 series graphics card,it is only supported by >cuda 11 because of sm86. But cuda11 is binded with tensorflow2. (there is a version of tensorflow 1.15 is available for cuda 11, you can have a try ) I also tried tensorflow-gpu 1.14, for some unknown reason, i passed step1 but failed with step2 ,for my tensorflow can find both cuda and cudnn but get Error with GPU: CUBLAS_STATUS_EXECUTION_FAILED. At last i get to the pytorch version superpoint, and met the same problem with cuda version that cuda10 only support up to sm75, then i figured out this might be the reason why I failed here. If you solve this problem please let me know.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

jehovahlbf commented 1 year ago

Which repositories did you use? I found two pytorch version superpoint, by eric-yyjau or shaofengzeng? And can you tell me your operating environment? such as Python pytorch Torchvision and cuda,Thanks a lot.

rpautrat commented 1 year ago

It seems that a solution can be found here: https://github.com/rpautrat/SuperPoint/issues/173#issuecomment-1772943447