train problem - Githubissues

I'm tring to train the Synthetic Shapes,but I GET THESE error; can u tell me how to correct it,thanks!!!!!!!

INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. [10/30/2018 20:33:00 INFO] Scale of 0 disables regularizer. 2018-10-30 20:33:00.687044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:05:00.0, compute capability: 6.1) 2018-10-30 20:33:00.687071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: TITAN Xp, pci bus id: 0000:06:00.0, compute capability: 6.1) 2018-10-30 20:33:00.687076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: TITAN Xp, pci bus id: 0000:09:00.0, compute capability: 6.1) INFO:tensorflow:Start training [10/30/2018 20:33:05 INFO] Start training 2018-10-30 20:33:06.198379: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: assertion failed: [None of the conditions evaluated as True. Conditions: (photometric_augmentation/while/Equal:0, photometric_augmentation/while/Equal_1:0, photometric_augmentation/while/Equal_2:0, photometric_augmentation/while/Equal_3:0, photometric_augmentation/while/Equal_4:0, photometric_augmentation/while/Equal_5:0), Values:] [0 0 0 1 0 0] [[Node: photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert = Assert[T=[DT_STRING, DT_BOOL], summarize=6](photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/data_0, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch_1)]] Traceback (most recent call last): File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [None of the conditions evaluated as True. Conditions: (photometric_augmentation/while/Equal:0, photometric_augmentation/while/Equal_1:0, photometric_augmentation/while/Equal_2:0, photometric_augmentation/while/Equal_3:0, photometric_augmentation/while/Equal_4:0, photometric_augmentation/while/Equal_5:0), Values:] [0 0 0 1 0 0] [[Node: photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert = Assert[T=[DT_STRING, DT_BOOL], summarize=6](photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/data_0, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch_1)]] [[Node: magicpoint/IteratorGetNext = IteratorGetNextoutput_shapes=[[?,?,?,1], [?,?,?], [?,?,2], [?,?,?]], output_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: magicpoint/train_data_sharding/stack_3/_9 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_77_magicpoint/train_data_sharding/stack_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/omnisky/0_2018_LIANG/2_DEEPLEARNING/1_DEPTH/learn-superpoint/SuperPoint-master/superpoint/experiment.py", line 148, in args.func(config, output_dir, args) File "/home/omnisky/0_2018_LIANG/2_DEEPLEARNING/1_DEPTH/learn-superpoint/SuperPoint-master/superpoint/experiment.py", line 86, in _cli_train train(config, config['train_iter'], output_dir) File "/home/omnisky/0_2018_LIANG/2_DEEPLEARNING/1_DEPTH/learn-superpoint/SuperPoint-master/superpoint/experiment.py", line 27, in train keep_checkpoints=config.get('keep_checkpoints', 1)) File "/home/omnisky/0_2018_LIANG/2_DEEPLEARNING/1_DEPTH/learn-superpoint/SuperPoint-master/superpoint/superpoint/models/base_model.py", line 313, in train options=options, run_metadata=run_metadata) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/omnisky/anaconda3/envs/test_py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [None of the conditions evaluated as True. Conditions: (photometric_augmentation/while/Equal:0, photometric_augmentation/while/Equal_1:0, photometric_augmentation/while/Equal_2:0, photometric_augmentation/while/Equal_3:0, photometric_augmentation/while/Equal_4:0, photometric_augmentation/while/Equal_5:0), Values:] [0 0 0 1 0 0] [[Node: photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert = Assert[T=[DT_STRING, DT_BOOL], summarize=6](photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/data_0, photometric_augmentation/while/case/If_0/Assert_1/AssertGuard/Assert/Switch_1)]] [[Node: magicpoint/IteratorGetNext = IteratorGetNextoutput_shapes=[[?,?,?,1], [?,?,?], [?,?,2], [?,?,?]], output_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: magicpoint/train_data_sharding/stack_3/_9 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_77_magicpoint/train_data_sharding/stack_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Hi, I have never seen such errors, this is weird.

I see that you use 3 GPUs, but can you try with only one to see if you get the same error? It should work for multiple GPUs, but someone has already had problem with more than 2 GPUs (with a different error though).

If it still happens, can you also try to disable the photometric augmentation (since it comes from here apparently) in your config file? Basically, just change the value of the field data->augmentation->photometric->enable to false in the file configs/magic-point_shapes.yaml. This should prevent the error from happening, but it would be interesting to see if the rest works or not.

Lastly, what is your Tensorflow version?

@rpautrat Thank you for your share! I am trying to run the code by the requirements . But when I do the next step ,it report errors. I try some ways but can't solve .So I hope you can give me some advice. Thanks a lot!!!!

wu@dada:~/SuperPoint$ sudo make install [sudo] password for wu: pip3 install -r requirements.txt The directory '/home/wu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/home/wu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Requirement already satisfied: tensorflow-gpu==1.6 in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (1.6.0) Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.15.4) Requirement already satisfied: scipy in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 3)) (1.1.0) Requirement already satisfied: opencv-python in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 4)) (3.4.1.15) Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 5)) (3.4.4.19) Requirement already satisfied: tqdm in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 6)) (4.28.1) Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 7)) (3.13) Requirement already satisfied: flake8 in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 8)) (3.6.0) Requirement already satisfied: jupyter in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 9)) (1.0.0) Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.2.0) Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (3.6.1) Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (1.16.1) Requirement already satisfied: tensorboard<1.7.0,>=1.6.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (1.6.0) Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.32.3) Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (1.1.0) Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (1.11.0) Requirement already satisfied: absl-py>=0.1.6 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.6.1) Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/site-packages (from tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.7.1) Requirement already satisfied: pyflakes<2.1.0,>=2.0.0 in /usr/local/lib/python3.6/site-packages (from flake8->-r requirements.txt (line 8)) (2.0.0) Requirement already satisfied: pycodestyle<2.5.0,>=2.4.0 in /usr/local/lib/python3.6/site-packages (from flake8->-r requirements.txt (line 8)) (2.4.0) Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /usr/local/lib/python3.6/site-packages (from flake8->-r requirements.txt (line 8)) (0.6.1) Requirement already satisfied: setuptools>=30 in /usr/local/lib/python3.6/site-packages (from flake8->-r requirements.txt (line 8)) (40.6.2) Requirement already satisfied: ipykernel in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (5.1.0) Requirement already satisfied: jupyter-console in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (6.0.0) Requirement already satisfied: notebook in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (5.7.2) Requirement already satisfied: qtconsole in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (4.4.3) Requirement already satisfied: ipywidgets in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (7.4.2) Requirement already satisfied: nbconvert in /usr/local/lib/python3.6/site-packages (from jupyter->-r requirements.txt (line 9)) (5.4.0) Requirement already satisfied: html5lib==0.9999999 in /usr/local/lib/python3.6/site-packages (from tensorboard<1.7.0,>=1.6.0->tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.9999999) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/site-packages (from tensorboard<1.7.0,>=1.6.0->tensorflow-gpu==1.6->-r requirements.txt (line 1)) (3.0.1) Requirement already satisfied: bleach==1.5.0 in /usr/local/lib/python3.6/site-packages (from tensorboard<1.7.0,>=1.6.0->tensorflow-gpu==1.6->-r requirements.txt (line 1)) (1.5.0) Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python3.6/site-packages (from tensorboard<1.7.0,>=1.6.0->tensorflow-gpu==1.6->-r requirements.txt (line 1)) (0.14.1) Requirement already satisfied: ipython>=5.0.0 in /usr/local/lib/python3.6/site-packages (from ipykernel->jupyter->-r requirements.txt (line 9)) (7.1.1) Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.6/site-packages (from ipykernel->jupyter->-r requirements.txt (line 9)) (5.1.1) Requirement already satisfied: traitlets>=4.1.0 in /usr/local/lib/python3.6/site-packages (from ipykernel->jupyter->-r requirements.txt (line 9)) (4.3.2) Requirement already satisfied: jupyter-client in /usr/local/lib/python3.6/site-packages (from ipykernel->jupyter->-r requirements.txt (line 9)) (5.2.3) Requirement already satisfied: pygments in /usr/local/lib/python3.6/site-packages (from jupyter-console->jupyter->-r requirements.txt (line 9)) (2.3.0) Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /usr/local/lib/python3.6/site-packages (from jupyter-console->jupyter->-r requirements.txt (line 9)) (2.0.7) Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (17.1.2) Requirement already satisfied: nbformat in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (4.4.0) Requirement already satisfied: jupyter-core>=4.4.0 in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (4.4.0) Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (0.2.0) Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (0.8.1) Requirement already satisfied: Send2Trash in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (1.5.0) Requirement already satisfied: jinja2 in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (2.10) Requirement already satisfied: prometheus-client in /usr/local/lib/python3.6/site-packages (from notebook->jupyter->-r requirements.txt (line 9)) (0.4.2) Requirement already satisfied: widgetsnbextension~=3.4.0 in /usr/local/lib/python3.6/site-packages (from ipywidgets->jupyter->-r requirements.txt (line 9)) (3.4.2) Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/site-packages (from nbconvert->jupyter->-r requirements.txt (line 9)) (0.5.0) Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/site-packages (from nbconvert->jupyter->-r requirements.txt (line 9)) (0.2.3) Requirement already satisfied: mistune>=0.8.1 in /usr/local/lib/python3.6/site-packages (from nbconvert->jupyter->-r requirements.txt (line 9)) (0.8.4) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/site-packages (from nbconvert->jupyter->-r requirements.txt (line 9)) (1.4.2) Requirement already satisfied: testpath in /usr/local/lib/python3.6/site-packages (from nbconvert->jupyter->-r requirements.txt (line 9)) (0.4.2) Requirement already satisfied: decorator in /usr/local/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (4.3.0) Requirement already satisfied: pickleshare in /usr/local/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (0.7.5) Requirement already satisfied: jedi>=0.10 in /usr/local/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (0.13.1) Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (4.6.0) Requirement already satisfied: backcall in /usr/local/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (0.1.0) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/site-packages (from jupyter-client->ipykernel->jupyter->-r requirements.txt (line 9)) (2.7.5) Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->jupyter-console->jupyter->-r requirements.txt (line 9)) (0.1.7) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/site-packages (from nbformat->notebook->jupyter->-r requirements.txt (line 9)) (2.6.0) Requirement already satisfied: ptyprocess; os_name != "nt" in /usr/local/lib/python3.6/site-packages (from terminado>=0.8.1->notebook->jupyter->-r requirements.txt (line 9)) (0.6.0) Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/site-packages (from jinja2->notebook->jupyter->-r requirements.txt (line 9)) (1.1.0) Requirement already satisfied: parso>=0.3.0 in /usr/local/lib/python3.6/site-packages (from jedi>=0.10->ipython>=5.0.0->ipykernel->jupyter->-r requirements.txt (line 9)) (0.3.1) pip install -e . The directory '/home/wu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/home/wu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Obtaining file:///home/wu/SuperPoint Installing collected packages: superpoint Found existing installation: superpoint 0.0 Uninstalling superpoint-0.0: Successfully uninstalled superpoint-0.0 Running setup.py develop for superpoint Successfully installed superpoint sh setup.sh Path of the directory where datasets are stored and read: superpoint/DATA_DIR Path of the directory where experiments data (logs, checkpoints, configs) are written: superpoint/EXPER_DIR wu@dada:~/SuperPoint$ cd superpoint wu@dada:~/SuperPoint/superpoint$ python experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth Traceback (most recent call last): File "experiment.py", line 2, in import yaml ImportError: No module named yaml

Hi! It seems that the module pyyaml cannot be found by your Python interpreter, even tough it should be installed after having installed the requirements (and according to your output, pyyaml is indeed installed on your computer).

This error is probably due to the fact that there is a mismatch between your pip and python installations. Apparently pip is installing files for python3.6. What do you get if you fire up which python in a terminal? If it is not /usr/bin/python3.6 or /usr/bin/python3, it would explain your error.

You can in that case use python3.6 to run the code, for example: python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth. Or you may want to use a python environment for the whole project (https://docs.python.org/3/library/venv.html).

Thank you for a quick reply!

I try the code python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth. But I couldn't solve the new error 'TMPDIR'. Could you give me some advice.

wu@dada:~/SuperPoint/superpoint$ python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth [12/03/2018 15:37:50 INFO] Running command TRAIN [12/03/2018 15:37:50 INFO] Number of GPUs detected: 1 Traceback (most recent call last): File "experiment.py", line 148, in args.func(config, output_dir, args) File "experiment.py", line 86, in _cli_train train(config, config['train_iter'], output_dir) File "experiment.py", line 21, in train with _init_graph(config) as net: File "/usr/local/lib/python3.6/contextlib.py", line 82, in enter return next(self.gen) File "experiment.py", line 68, in _init_graph dataset = get_dataset(config['data']['name'])(config['data']) File "/home/wu/SuperPoint/superpoint/datasets/base_dataset.py", line 102, in init self.dataset = self._init_dataset(self.config) File "/home/wu/SuperPoint/superpoint/datasets/synthetic_shapes.py", line 126, in _init_dataset self.dump_primitive_data(primitive, tar_path, config) File "/home/wu/SuperPoint/superpoint/datasets/synthetic_shapes.py", line 71, in dump_primitive_data temp_dir = Path(os.environ['TMPDIR'], primitive) File "/usr/local/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'TMPDIR'

At 2018-12-02 22:22:03, "Rémi Pautrat" notifications@github.com wrote:

This error is probably due to the fact that there is a mismatch between your pip and python installations. Apparently pip is installing files for python3.6. What do you get if you fire up which python in a terminal? If it is not /usr/bin/python3.6 or /usr/bin/python3, it would explain your error.

You can in that case use python3.6 to run the code, for example: python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth. Or you may want to use a python environment for the whole project (https://docs.python.org/3/library/venv.html).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Running in a terminal for example export TMPDIR=/tmp/ before executing the code should solve your issue. Apparently the environment variable indicating the tmp dir of your computer was not set.

Hi! @rpautrat I am trying to run the code.The result is not so satisfied. The result from running detector_evaluation_magic_point default

default mp_synth-v6_no-aug_synth-noise is different from that of Git.Could you give me some advice. This is the config file for train and detection: data: add_augmentation_to_test_set: false augmentation: homographic: enable: false params: allow_artifacts: true max_angle: 1.57 patch_ratio: 0.8 perspective: true perspective_amplitude_x: 0.2 perspective_amplitude_y: 0.2 rotation: true scaling: true scaling_amplitude: 0.2 translation: true translation_overflow: 0.05 valid_border_margin: 2 photometric: enable: false params: additive_gaussian_noise: stddev_range:

0
15 additive_shade: kernel_size_range:
50
100 transparency_range:
-0.5
0.8 additive_speckle_noise: prob_range:
0
0.0035 motion_blur: max_kernel_size: 7 random_brightness: max_abs_change: 75 random_contrast: strength_range:
0.3
1.8 primitives:
- random_brightness
- random_contrast
- additive_speckle_noise
- additive_gaussian_noise
- additive_shade
- motion_blur cache_in_memory: true name: synthetic_shapes preprocessing: blur_size: 21 resize:
  - 120
  - 160 primitives: all suffix: v6 truncate: draw_ellipses: 0.3 draw_stripes: 0.2 gaussian_noise: 0.1 validation_size: 500 eval_iter: 200 model: batch_size: 64 detection_threshold: 0.001 eval_batch_size: 50 kernel_reg: 0.0 learning_rate: 0.001 name: magic_point nms: 4 seed: 0 train_iter: 16000 validation_interval: 1000

Hi @Cloud555, In order to evaluate with noise, you need to set the parameter add_augmentation_to_test_set to True in the config file. This might explain the difference that you observed for mp_synth-v6_no-aug_synth-noise.

@rpautrat OK Thanks a lot.

Hi @rpautrat About the configuration file superpoint_coco.yamland superpoint_hpatches.yaml,I don't know exactly how to use It. Could you please provide some hints on this? I used the command python experiment.py train configs/superpoint_coco.yaml superpoint_coco to train descriptors,error occurred:

2018-12-14 15:24:35.115078: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****____ 2018-12-14 15:24:35.115153: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[3,30,40,30,40,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "/home/hui/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/home/hui/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/home/hui/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,30,40,30,40,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: superpoint/train_tower0/gradients/superpoint/train_tower0/mul_3_grad/mul_1 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](superpoint/train_tower0/Reshape_6, superpoint/train_tower0/gradients/superpoint/train_tower0/Sum_grad/Tile)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: superpoint/train_tower0/gradients/AddN_17/_427 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2157_superpoint/train_tower0/gradients/AddN_17", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Does the train care the GPU type? I use one GTX 1070Ti. Could you give me some advice?

The OOM (Out Of Memory) error probably happens because the memory of your GPU is exceeded. I would suggest to launch only one training at a time (to be sure that your GPU is busy only with one job at a time) and maybe to reduce the batch size (in the config file, in model->batch_size and eval_batch_size).

Otherwise the command you used should be fine.

I used the following command to export the descriptors 1.python experiment.py train configs/superpoint_coco.yaml superpoint_coco(I set the parameter batch_size=2 and eval_batch_size=2)

[12/16/2018 22:56:47 INFO] Start training 2018-12-16 22:56:51.348616: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x55b491354bb0 [12/16/2018 22:57:26 INFO] Iter 0: loss 9.4027, precision 0.0030, recall 0.0088 [12/16/2018 23:02:01 INFO] Iter 1000: loss 4.1702, precision 0.0338, recall 0.0988 [12/16/2018 23:06:11 INFO] Iter 2000: loss 2.5584, precision 0.0547, recall 0.1434 [12/16/2018 23:10:21 INFO] Iter 3000: loss 1.6024, precision 0.0761, recall 0.1639 [12/16/2018 23:14:30 INFO] Iter 4000: loss 2.0186, precision 0.0975, recall 0.1864 [12/16/2018 23:18:40 INFO] Iter 5000: loss 1.8968, precision 0.1169, recall 0.1902 [12/16/2018 23:22:49 INFO] Iter 6000: loss 1.7606, precision 0.1199, recall 0.1993 [12/16/2018 23:26:59 INFO] Iter 7000: loss 1.7886, precision 0.1253, recall 0.2061 [12/16/2018 23:31:09 INFO] Iter 8000: loss 1.5508, precision 0.1239, recall 0.1855 [12/16/2018 23:35:19 INFO] Iter 9000: loss 1.7049, precision 0.1242, recall 0.2085 [12/16/2018 23:39:43 INFO] Iter 10000: loss 1.4050, precision 0.1456, recall 0.2158 [12/16/2018 23:44:12 INFO] Iter 11000: loss 1.1111, precision 0.1244, recall 0.2336

python export_descriptors.py configs/superpoint_hpatches.yaml superpoint_coco --export_name=sp_v6_hp-v

The output file is too large. The size of each ZIP file in the output file ($EXPER_DIR/outputs/sp_v6_hp-v/) is about 600M. Is this the right thing? default

Looking at your training, I would advise you to let it train a little bit longer, it doesn't seem to have fully converged.

Regarding the size of the export, it is unfortunately the right thing. The reason is that a descriptor output is of size 480x640x256 for each image (with the default parameters), which is quite big. If you are running out of memory, you can try to either reduce the resolution of the images (data->preprocessing->resize in the config file) or the size of the descriptors (model->descriptor_size in the config file). But in the first case, you should also then change the value of NMS accordingly (if you divide the size of the image by 2, also divide the NMS by 2).

Thank you very much！

@wuyuzaizai ,Hi,i met the same bug with you,have you solved it?can you give me some help,thanks!

Hi! I ran the code：python experiment.py train configs/superpoint_coco.yaml superpoint_coco. And the OOM happened. I reduced the batch size (in the configs/superpoint_coco.yaml, in model->batch_size and eval_batch_size) to 2 or 1, they all KILLED in the training process. I don’t know how to solve it. Could you give me some advice? Thanks a lot!Note: The problem didn’t occur in STEP3(training MagicPoint on MS-coco).My computer is GPU 1080 , CPU i7-8700 8G.

If you still get an OOM with a batch size of 1, it means that your GPU memory is unfortunately not enough... The last thing that you can do is to reduce the resolution of your train images. You can for example use a size of 320x240 by adding the parameter data->preprocessing->resize: [320, 240] to the config file.

It makes sense that you were able to train MagicPoint but not SuperPoint with this same GPU, it's because the computation of the loss of SuperPoint is very demanding in GPU memory.

Thank you for a quick reply! I try the code python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth. But I couldn't solve the new error 'TMPDIR'. Could you give me some advice. wu@dada:~/SuperPoint/superpoint$ python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth [12/03/2018 15:37:50 INFO] Running command TRAIN [12/03/2018 15:37:50 INFO] Number of GPUs detected: 1 Traceback (most recent call last): File "experiment.py", line 148, in args.func(config, output_dir, args) File "experiment.py", line 86, in _cli_train train(config, config['train_iter'], output_dir) File "experiment.py", line 21, in train with _init_graph(config) as net: File "/usr/local/lib/python3.6/contextlib.py", line 82, in enter return next(self.gen) File "experiment.py", line 68, in _init_graph dataset = get_dataset(config['data']['name'])(config['data']) File "/home/wu/SuperPoint/superpoint/datasets/base_dataset.py", line 102, in init self.dataset = self._init_dataset(self.config) File "/home/wu/SuperPoint/superpoint/datasets/synthetic_shapes.py", line 126, in _init_dataset self.dump_primitive_data(primitive, tar_path, config) File "/home/wu/SuperPoint/superpoint/datasets/synthetic_shapes.py", line 71, in dump_primitive_data temp_dir = Path(os.environ['TMPDIR'], primitive) File "/usr/local/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'TMPDIR' At 2018-12-02 22:22:03, "Rémi Pautrat" notifications@github.com wrote: Hi! It seems that the module pyyaml cannot be found by your Python interpreter, even tough it should be installed after having installed the requirements (and according to your output, pyyaml is indeed installed on your computer). This error is probably due to the fact that there is a mismatch between your pip and python installations. Apparently pip is installing files for python3.6. What do you get if you fire up which python in a terminal? If it is not /usr/bin/python3.6 or /usr/bin/python3, it would explain your error. You can in that case use python3.6 to run the code, for example: python3.6 experiment.py train configs/magic-point_shapes.yaml EXPER_DIR/magic-point_synth. Or you may want to use a python environment for the whole project (https://docs.python.org/3/library/venv.html). — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

because your version python conflicts to this argument, you can fix from "default=environ['TMPDIR']" to "os.environ.get("TMPDIR")" in that add_argument line

rpautrat / SuperPoint

train problem #21