Segmentation fault when testing tandem on a custom synthetic data

pankhurivanjani commented 1 year ago

Hello,

I am trying to test Tandem on a custom dataset.

When it loads the model the program crashes and I get a segmentation fault like this

(tandem) pankhuri@pankhuri-GE76-Raider-11UH:~/tandem/tandem$ ./build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/tandem-dataset/images calib=/home/pankhuri/tandem-dataset/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
loading data from /home/pankhuri/tandem-dataset/images!
loading calibration from /home/pankhuri/tandem-dataset/camera.txt!
Loading MVSNet from /home/pankhuri/tandem/tandem/exported/tandem/!
PHOTOMETRIC MODE WITHOUT CALIBRATION!

=============== TANDEM Settings: ===============
    Setting 'gui':
    - no real-time enforcing
    - 2000 active points
    - 5-7 active frames
    - 1-6 LM iteration each KF
    - TSDF fusion: yes
    - dense tracking on cpu (step=1)
    - Pangolin
      - Fullscreen: 0
      - Mesh: 1
      - Smaller Images: 1

Reading Calibration from file /home/pankhuri/tandem-dataset/camera.txt ... found!
Input resolution: 640 480
In: 300.000000 300.000000 480.000000 270.000000 0.000000
Out: Rectify Crop
Output resolution: 640 480
finding CROP optimal new model!
initial range: x: -1.6159 - 0.5352; y: -0.9090 - 0.7036!
iteration 00001: range: x: -1.6078 - 0.5325; y: -0.9090 - 0.7036!
iteration 00002: range: x: -1.5998 - 0.5299; y: -0.9090 - 0.7036!
iteration 00003: range: x: -1.5998 - 0.5299; y: -0.9045 - 0.7000!
iteration 00004: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!
iteration 00005: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!

Rectified Kamera Matrix:
300.051       0 480.015
      0 300.035 270.011
      0       0       1

NO PHOTOMETRIC Calibration!
Reading Photometric Calibration from file 
PhotometricUndistorter: Could not open file!
got 0 images and 0 timestamps and 0 exposures.!
ImageFolderReader: got 0 files in /home/pankhuri/tandem-dataset/images!
using pyramid levels 0 to 3. coarsest resolution: 80 x 60!
START PANGOLIN!

----DRMVSNET Initalizing fusion----

----DRMVSNET Initalizing fusion done----
DrMvsnet torch::cuda::is_vailable == true --> seems good
View Num: 7, ref index: 5
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())
Correctness:
    Depth correct     : 0, error: 0.0106286
    Confidence correct: 1, error: 0.00296127
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.0029548
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Performance:
    CallAsync     : 6.832 ms
    Ready         : 0 ms
    GetResult     : 116.539 ms
There has been an error. Do not use the model.
Segmentation fault (core dumped)
(tandem) pankhuri@pankhuri-GE76-Raider-11UH:~/tandem/tandem$

What could be the reason behind this segmentation fault? Is it dependent on geometric calibration?

Note:

It works fine with other dataset like tum rgbd freiburg sequence, although there comes a warning of not loading MVSNet successfully

(tandem) pankhuri@pankhuri-GE76-Raider-11UH:~/tandem/tandem$ ./build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/rgbd_dataset_freiburg2_xyz/rgb calib=/home/pankhuri/rgbd_dataset_freiburg2_xyz/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
loading data from /home/pankhuri/rgbd_dataset_freiburg2_xyz/rgb!
loading calibration from /home/pankhuri/rgbd_dataset_freiburg2_xyz/camera.txt!
Loading MVSNet from /home/pankhuri/tandem/tandem/exported/tandem/!
PHOTOMETRIC MODE WITHOUT CALIBRATION!

=============== TANDEM Settings: ===============
    Setting 'gui':
    - no real-time enforcing
    - 2000 active points
    - 5-7 active frames
    - 1-6 LM iteration each KF
    - TSDF fusion: yes
    - dense tracking on cpu (step=1)
    - Pangolin
      - Fullscreen: 0
      - Mesh: 1
      - Smaller Images: 1

Reading Calibration from file /home/pankhuri/rgbd_dataset_freiburg2_xyz/camera.txt ... found!
Input resolution: 640 480
In: 517.300000 516.500000 318.600000 255.300000 0.000000
Out: Rectify Crop
Output resolution: 640 480
finding CROP optimal new model!
initial range: x: -0.6220 - 0.6255; y: -0.4991 - 0.4374!
iteration 00001: range: x: -0.6188 - 0.6224; y: -0.4991 - 0.4374!
iteration 00002: range: x: -0.6158 - 0.6193; y: -0.4991 - 0.4374!
iteration 00003: range: x: -0.6158 - 0.6193; y: -0.4966 - 0.4352!
iteration 00004: range: x: -0.6158 - 0.6193; y: -0.4942 - 0.4331!
iteration 00005: range: x: -0.6158 - 0.6193; y: -0.4942 - 0.4331!

Rectified Kamera Matrix:
517.406       0 318.595
      0 516.592 255.281
      0       0       1

NO PHOTOMETRIC Calibration!
Reading Photometric Calibration from file 
PhotometricUndistorter: Could not open file!
set EXPOSURES to zero!
got 3669 images and 3669 timestamps and 0 exposures.!
ImageFolderReader: got 3669 files in /home/pankhuri/rgbd_dataset_freiburg2_xyz/rgb!
using pyramid levels 0 to 3. coarsest resolution: 80 x 60!
START PANGOLIN!

----DRMVSNET Initalizing fusion----

----DRMVSNET Initalizing fusion done----
DrMvsnet torch::cuda::is_vailable == true --> seems good
View Num: 7, ref index: 5
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())
Correctness:
    Depth correct     : 0, error: 0.0106286
    Confidence correct: 1, error: 0.00296125
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.0029548
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Performance:
    CallAsync     : 6.80075 ms
    Ready         : 0 ms
    GetResult     : 116.749 ms
There has been an error. Do not use the model.
Couldn't load MVSNet successfully.INITIALIZE FROM INITIALIZER (2048 pts)!

TANDEM TIMING: ==================
3668 Frames (32.8 fps)
30.48ms per frame; 
111.79s total time; 
======================

DrFusion::SaveMeshToFile volume_->ExtractMesh (485 ms)
DrFusion::SaveMeshToFile mesh.SaveToFile (3587 ms)
Mesh Saving done!
^CCaught signal 2

pankhurivanjani commented 1 year ago

I tried gdb debugger to see more about segmentation fault and here is the output:

(tandem) pankhuri@pankhuri-GE76-Raider-11UH:~/tandem/tandem$ gdb --args ./build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/tandem-dataset/images calib=/home/pankhuri/tandem-dataset/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/bin/tandem_dataset...
(gdb) run 
Starting program: /home/pankhuri/tandem/tandem/build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/tandem-dataset/images calib=/home/pankhuri/tandem-dataset/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
loading data from /home/pankhuri/tandem-dataset/images!
loading calibration from /home/pankhuri/tandem-dataset/camera.txt!
Loading MVSNet from /home/pankhuri/tandem/tandem/exported/tandem/!
PHOTOMETRIC MODE WITHOUT CALIBRATION!

=============== TANDEM Settings: ===============
    Setting 'gui':
    - no real-time enforcing
    - 2000 active points
    - 5-7 active frames
    - 1-6 LM iteration each KF
    - TSDF fusion: yes
    - dense tracking on cpu (step=1)
    - Pangolin
      - Fullscreen: 0
      - Mesh: 1
      - Smaller Images: 1

[New Thread 0x7ffef59ad000 (LWP 116410)]
Reading Calibration from file /home/pankhuri/tandem-dataset/camera.txt ... found!
Input resolution: 640 480
In: 300.000000 300.000000 480.000000 270.000000 0.000000
Out: Rectify Crop
Output resolution: 640 480
finding CROP optimal new model!
initial range: x: -1.6159 - 0.5352; y: -0.9090 - 0.7036!
iteration 00001: range: x: -1.6078 - 0.5325; y: -0.9090 - 0.7036!
iteration 00002: range: x: -1.5998 - 0.5299; y: -0.9090 - 0.7036!
iteration 00003: range: x: -1.5998 - 0.5299; y: -0.9045 - 0.7000!
iteration 00004: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!
iteration 00005: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!

Rectified Kamera Matrix:
300.051       0 480.015
      0 300.035 270.011
      0       0       1

NO PHOTOMETRIC Calibration!
Reading Photometric Calibration from file 
PhotometricUndistorter: Could not open file!
got 0 images and 0 timestamps and 0 exposures.!
ImageFolderReader: got 0 files in /home/pankhuri/tandem-dataset/images!
using pyramid levels 0 to 3. coarsest resolution: 80 x 60!
[New Thread 0x7ffef4e25000 (LWP 116411)]
[New Thread 0x7ffef4624000 (LWP 116412)]
[New Thread 0x7ffef3e23000 (LWP 116413)]
[New Thread 0x7ffef3622000 (LWP 116414)]
[New Thread 0x7ffef2e21000 (LWP 116415)]
[New Thread 0x7ffef2620000 (LWP 116416)]
[New Thread 0x7ffee6f3e000 (LWP 116417)]
[New Thread 0x7ffe902e1000 (LWP 116418)]

----DRMVSNET Initalizing fusion----
START PANGOLIN!
[New Thread 0x7ffe82849000 (LWP 116419)]
[New Thread 0x7ffe82048000 (LWP 116420)]
[New Thread 0x7ffe7b575000 (LWP 116424)]
[New Thread 0x7ffe7ad74000 (LWP 116425)]
[New Thread 0x7ffe7a573000 (LWP 116426)]

----DRMVSNET Initalizing fusion done----
DrMvsnet torch::cuda::is_vailable == true --> seems good
[New Thread 0x7ffe79b72000 (LWP 116427)]
View Num: 7, ref index: 5
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())
[New Thread 0x7ffe79371000 (LWP 116428)]
[New Thread 0x7ffe78b70000 (LWP 116429)]
[New Thread 0x7ffe6c9fd000 (LWP 116430)]
[New Thread 0x7ffe64994000 (LWP 116431)]
[New Thread 0x7ffe29fff000 (LWP 116432)]
[New Thread 0x7ffe297fe000 (LWP 116433)]
[New Thread 0x7ffe28ffd000 (LWP 116434)]
Correctness:
    Depth correct     : 0, error: 0.0106286
    Confidence correct: 1, error: 0.00296127
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295482
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Correctness:
    Depth correct     : 0, error: 0.0106182
    Confidence correct: 1, error: 0.00295481
Performance:
    CallAsync     : 10.3755 ms
    Ready         : 0 ms
    GetResult     : 276.044 ms
There has been an error. Do not use the model.
[New Thread 0x7ffe81747000 (LWP 116437)]
[Thread 0x7ffee6f3e000 (LWP 116417) exited]

Thread 10 "tandem_dataset" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe902e1000 (LWP 116418)]
0x0000555555585cd3 in <lambda()>::operator()(void) const (__closure=0x555558c6b498) at /usr/include/c++/9/bits/stl_iterator.h:864
864       operator-(difference_type __n) const _GLIBCXX_NOEXCEPT

pytholic commented 1 year ago

I tried gdb debugger to see more about segmentation fault and here is the output:

(tandem) pankhuri@pankhuri-GE76-Raider-11UH:~/tandem/tandem$ gdb --args ./build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/tandem-dataset/images calib=/home/pankhuri/tandem-dataset/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/bin/tandem_dataset...
(gdb) run 
Starting program: /home/pankhuri/tandem/tandem/build/bin/tandem_dataset preset=gui result_folder=/home/pankhuri/tandem-dataset/results files=/home/pankhuri/tandem-dataset/images calib=/home/pankhuri/tandem-dataset/camera.txt mvsnet_folder=/home/pankhuri/tandem/tandem/exported/tandem mode=1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
loading data from /home/pankhuri/tandem-dataset/images!
loading calibration from /home/pankhuri/tandem-dataset/camera.txt!
Loading MVSNet from /home/pankhuri/tandem/tandem/exported/tandem/!
PHOTOMETRIC MODE WITHOUT CALIBRATION!

=============== TANDEM Settings: ===============
  Setting 'gui':
  - no real-time enforcing
  - 2000 active points
  - 5-7 active frames
  - 1-6 LM iteration each KF
  - TSDF fusion: yes
  - dense tracking on cpu (step=1)
  - Pangolin
    - Fullscreen: 0
    - Mesh: 1
    - Smaller Images: 1

[New Thread 0x7ffef59ad000 (LWP 116410)]
Reading Calibration from file /home/pankhuri/tandem-dataset/camera.txt ... found!
Input resolution: 640 480
In: 300.000000 300.000000 480.000000 270.000000 0.000000
Out: Rectify Crop
Output resolution: 640 480
finding CROP optimal new model!
initial range: x: -1.6159 - 0.5352; y: -0.9090 - 0.7036!
iteration 00001: range: x: -1.6078 - 0.5325; y: -0.9090 - 0.7036!
iteration 00002: range: x: -1.5998 - 0.5299; y: -0.9090 - 0.7036!
iteration 00003: range: x: -1.5998 - 0.5299; y: -0.9045 - 0.7000!
iteration 00004: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!
iteration 00005: range: x: -1.5998 - 0.5299; y: -0.8999 - 0.6965!

Rectified Kamera Matrix:
300.051       0 480.015
      0 300.035 270.011
      0       0       1

NO PHOTOMETRIC Calibration!
Reading Photometric Calibration from file 
PhotometricUndistorter: Could not open file!
got 0 images and 0 timestamps and 0 exposures.!
ImageFolderReader: got 0 files in /home/pankhuri/tandem-dataset/images!
using pyramid levels 0 to 3. coarsest resolution: 80 x 60!
[New Thread 0x7ffef4e25000 (LWP 116411)]
[New Thread 0x7ffef4624000 (LWP 116412)]
[New Thread 0x7ffef3e23000 (LWP 116413)]
[New Thread 0x7ffef3622000 (LWP 116414)]
[New Thread 0x7ffef2e21000 (LWP 116415)]
[New Thread 0x7ffef2620000 (LWP 116416)]
[New Thread 0x7ffee6f3e000 (LWP 116417)]
[New Thread 0x7ffe902e1000 (LWP 116418)]

----DRMVSNET Initalizing fusion----
START PANGOLIN!
[New Thread 0x7ffe82849000 (LWP 116419)]
[New Thread 0x7ffe82048000 (LWP 116420)]
[New Thread 0x7ffe7b575000 (LWP 116424)]
[New Thread 0x7ffe7ad74000 (LWP 116425)]
[New Thread 0x7ffe7a573000 (LWP 116426)]

----DRMVSNET Initalizing fusion done----
DrMvsnet torch::cuda::is_vailable == true --> seems good
[New Thread 0x7ffe79b72000 (LWP 116427)]
View Num: 7, ref index: 5
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())
[New Thread 0x7ffe79371000 (LWP 116428)]
[New Thread 0x7ffe78b70000 (LWP 116429)]
[New Thread 0x7ffe6c9fd000 (LWP 116430)]
[New Thread 0x7ffe64994000 (LWP 116431)]
[New Thread 0x7ffe29fff000 (LWP 116432)]
[New Thread 0x7ffe297fe000 (LWP 116433)]
[New Thread 0x7ffe28ffd000 (LWP 116434)]
Correctness:
  Depth correct     : 0, error: 0.0106286
  Confidence correct: 1, error: 0.00296127
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295481
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295482
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295482
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295482
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295482
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295482
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295481
Correctness:
  Depth correct     : 0, error: 0.0106182
  Confidence correct: 1, error: 0.00295481
Performance:
  CallAsync     : 10.3755 ms
  Ready         : 0 ms
  GetResult     : 276.044 ms
There has been an error. Do not use the model.
[New Thread 0x7ffe81747000 (LWP 116437)]
[Thread 0x7ffee6f3e000 (LWP 116417) exited]

Thread 10 "tandem_dataset" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe902e1000 (LWP 116418)]
0x0000555555585cd3 in <lambda()>::operator()(void) const (__closure=0x555558c6b498) at /usr/include/c++/9/bits/stl_iterator.h:864
864         operator-(difference_type __n) const _GLIBCXX_NOEXCEPT

HI, I am facing same issue with custom dataset. Did you figure it out? Is it because of large input size? Moreover it also says that Couldn't load MVSNet successfully.

tum-vision / tandem

Segmentation fault when testing tandem on a custom synthetic data #39