strawlab / MultiCamSelfCal

multiple camera self calibration Toolbox
http://cmp.felk.cvut.cz/~svoboda/SelfCal/
196 stars 77 forks source link

Segmentation faults with MCSC for specific use-nth-observation values #11

Closed elhananby closed 2 years ago

elhananby commented 2 years ago

Hi everyone, I did a quick recalibrartion today, and found an issue that either MCSC or maybe Octave just isn't working as intended.

Specifically, when running flydra_analysis_generate_recalibration --2d-data $DATAFILE --disable-kalman-objs $DATAFILE --undistort-intrinsics-yaml=$HOME/.config/strand-cam/camera_info --run-mcsc --use-nth-observation=4, the program just hangs on RANSAC validation step running with tolerance threshold: 10.00 ..., with octave-gui CPU usage at ~50%.

The same also happens when trying to run octave gocal.m --config=../strawlab/test-data/DATA20100906_134124/no-global-iterations.cfg or nosetests.

Weirdly enough, when changing the --use-nth-observation to values between 9-14, I am getting this error: fatal: caught signal fatal: caught signal Segmentation fault -- stopping myself... Segmentation fault -- stopping myself... /bin/bash: line 1: 26524 Segmentation fault (core dumped) /usr/bin/octave gocal.m --config=/media/benyishay_la/Data/Experiments/Calibration/20220908_112705.braidz.h5.recal/result/multicamselfcal.cfg > >(tee /media/benyishay_la/Data/Experiments/Calibration/20220908_112705.braidz.h5.recal/result/STDOUT) 2> >(tee /media/benyishay_la/Data/Experiments/Calibration/20220908_112705.braidz.h5.recal/result/STDERR >&2) but for 15 and above it suddenly works again.

I guess that for very low values it might just be running very slow, but I'm not sure where the segmentation fault error is coming from.

Edit Sep 8, 12:36: short update - the Segmentation fault issue happens whenever there are more than 300 points/frames that survive RANSAC validation.

I am using Ubuntu 20.04.5 with Octave 7.2.0.

Thanks, Elhanan

astraw commented 2 years ago

Can you reproduce the segmentation fault with the following test:

octave gocal.m --config=../strawlab/test-data/DATA20100906_134124/no-global-iterations.cfg

You may also want to look at the test-python stanza in the .github/workflows/test.yml file for further tests you can run.

We should try to identify what line of code triggers the seg fault. "Print debugging" - adding little debug statements before and after potential sites is the easiest way to do this. Alternatively, you could try to get a traceback and inspect it.

elhananby commented 2 years ago

Hi Andrew, So running octave gocal.m --config=../strawlab/test-data/DATA20100906_134124/no-global-iterations.cfg also gives immediate segmentation fault, and the nosetests gives this error:


ERROR: test_mcsc.test_mcsc('DATA20100906_134124',)

Traceback (most recent call last):
  File "/home/benyishay_la/miniconda3/envs/flydra/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/benyishay_la/src/MultiCamSelfCal/python/test/test_mcsc.py", line 30, in check_mcsc
    caldir = mcsc.execute(silent=True)
  File "/home/benyishay_la/src/MultiCamSelfCal/python/multicamselfcal/execute.py", line 249, in execute
    raise RuntimeError('MCSC failed')
RuntimeError: MCSC failed
-------------------- >> begin captured logging << --------------------
mcsc: INFO: running mcsc (result dir: /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/result)
mcsc: WARNING: Could not find /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/original_cam_centers.dat
mcsc: WARNING: Could not find /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/basename1.rad
mcsc: WARNING: Could not find /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/basename2.rad
mcsc: WARNING: Could not find /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/basename3.rad
mcsc: WARNING: Could not find /home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/basename4.rad
mcsc.cmd: DEBUG: running cmd ['/usr/bin/octave', 'gocal.m', '--config=/home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/result/multicamselfcal.cfg'] kwargs: {'stdin': None, 'stdout': <_io.TextIOWrapper name='/home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/result/STDOUT' mode='w' encoding='UTF-8'>, 'stderr': <_io.TextIOWrapper name='/home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/result/STDERR' mode='w' encoding='UTF-8'>, 'shell': False, 'executable': None, 'cwd': '/home/benyishay_la/src/MultiCamSelfCal/MultiCamSelfCal'}
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 3 tests in 10.149s

FAILED (errors=1)

when trying to run the test octave gocal.m --config=/home/benyishay_la/src/MultiCamSelfCal/strawlab/test-data/DATA20100906_134124/result/multicamselfcal.cfg directly, octave-gui jumps to 50% cpu usage and gets stuck on RANSAC validation step running with tolerance threshold: 10.00 ...

I will go over the octave code and try to see where the issue might be, as you suggested. So far I can still do calibration, but only for n<300 points, which is usually enough for a good result.

astraw commented 2 years ago

As another idea: this code should also still run in MATLAB. I haven't tried that in years, but it did work. I ported it to Octave and for a while ensured that it ran correctly in both. There are nice plots you can enable in MATLAB but I never got those working in Octave.

elhananby commented 2 years ago

Ok, I was able to track the issue to this line [F, inls] = rEG(Wspair,tol,tol,0.99); in the function findinl.m in MultiCamSelfCal/CoreFunctions. It seems like this function is not even getting called, because none of the debug lines are printing. It either gives an immediate segmentation fault or octave gets stuck at 50% CPU usage and I have to manually kill it.

When trying to run rEG.m manually using the variables saved in a .mat file, it gets stuck in trying to call lin_fm from fu2E7. I tried maybe recompiling lin_fm using mkoctfile, but I am getting a error: ‘Array<double>::ArrayRep* Array<double>::rep’ is protected within this context.

At this point, I am not sure what the issue could be. I may try to install Matlab and see how it behaves there.

astraw commented 2 years ago

You could also try using a different version of Octave. For example, you could install Ubuntu version X into a virtual machine, install octave, and run the example there. That would let you test if the issue is octave-version specific.

elhananby commented 2 years ago

Short update - I tested different versions of Octave + Ubuntu, and the only version where I don't have any issues is Octave 6.4.0 (regardless of the Ubuntu version). An interesting thing is that I am getting completely different results when running 6.4.0 vs 7.2.0 (when it doesn't crash) on the same file:

Octave 6.4.0

flydra_analysis_generate_recalibration --2d-data $DATAFILE --disable-kalman-objs $DATAFILE --undistort-intrinsics-yaml=$HOME/.config/strand-cam/camera_info  --run-mcsc --use-nth-observation=10

245 points
by camera id:
 Basler_23088879: 179
 Basler_23088882: 174
 Basler_40080153: 235
 Basler_40080159: 234
 Basler_40150423: 240
 Basler_40150424: 230
by n points:
 4: 17
 5: 30
 6: 160
 3: 38

WARNING:mcsc:Could not find /mnt/Experiments/Calibration/20221013_124823.braidz.h5.recal/original_cam_centers.dat
warning: function /opt/multicamselfcal/CalTechCal/quiver.m shadows a core library function
warning: called from
    gocal at line 23 column 1

arg = --config=/mnt/Experiments/Calibration/20221013_124823.braidz.h5.recal/result/multicamselfcal.cfg
config_dir = /mnt/Experiments/Calibration/20221013_124823.braidz.h5.recal/result/
Multi-Camera Self-Calibration, Tomas Svoboda et al., 07/2003
************************************************************
Experiment name: strawlab_test

********** After 0 iteration *******************************************
RANSAC validation step running with tolerance threshold: 10.00 ...
RANSAC: 1 samples, 230 inliers out of 230 points
RANSAC: 1 samples, 230 inliers out of 230 points
RANSAC: 3 samples, 216 inliers out of 219 points
RANSAC: 1 samples, 215 inliers out of 215 points
RANSAC: 2 samples, 176 inliers out of 178 points
RANSAC: 2 samples, 174 inliers out of 174 points
227 points/frames have survived validations so far
Filling of missing points is running ...
Repr. error in proj. space (no fact./fact.) is ...  0.837817 0.803486
************************************************************
Number of detected outliers:   0
About cameras (Id, 2D reprojection error, #inliers):
CamId    std       mean  #inliers 
  1      0.66      0.80    176 
  2      0.53      0.72    174 
  3      0.70      0.78    230 
  4      0.65      0.90    216 
  5      0.56      0.72    230 
  6      0.89      0.88    215 
***************************************************************
**************************************************************
Refinement by using Bundle Adjustment
Repr. error in proj. space (no fact./fact./BA) is ...  0.844118 0.810360 0.746028
2D reprojection error
All points: mean  0.75 pixels, std is 0.58

finished: result in  /mnt/Experiments/Calibration/20221013_124823.braidz.h5.recal/result

Octave 7.2.0

flydra_analysis_generate_recalibration --2d-data $DATAFILE --disable-kalman-objs $DATAFILE --undistort-intrinsics-yaml=$HOME/.config/strand-cam/camera_info  --run-mcsc --use-nth-observation=10
245 points
by camera id:
 Basler_23088879: 179
 Basler_23088882: 174
 Basler_40080153: 235
 Basler_40080159: 234
 Basler_40150423: 240
 Basler_40150424: 230
by n points:
 4: 17
 5: 30
 6: 160
 3: 38

WARNING:mcsc:Could not find /media/benyishay_la/Data/Experiments/Calibration/20221013_124823.braidz.h5.recal/original_cam_centers.dat
warning: function /opt/multicamselfcal/CalTechCal/quiver.m shadows a core library function
warning: called from
    gocal at line 23 column 1

arg = --config=/media/benyishay_la/Data/Experiments/Calibration/20221013_124823.braidz.h5.recal/result/multicamselfcal.cfg
config_dir = /media/benyishay_la/Data/Experiments/Calibration/20221013_124823.braidz.h5.recal/result/
Multi-Camera Self-Calibration, Tomas Svoboda et al., 07/2003
************************************************************
Experiment name: strawlab_test

********** After 0 iteration *******************************************
RANSAC validation step running with tolerance threshold: 10.00 ...
RANSAC: 1 samples, 230 inliers out of 230 points
RANSAC: 1 samples, 230 inliers out of 230 points
RANSAC: 1 samples, 219 inliers out of 219 points
RANSAC: 2 samples, 215 inliers out of 215 points
RANSAC: 1 samples, 178 inliers out of 178 points
RANSAC: 1 samples, 174 inliers out of 174 points
230 points/frames have survived validations so far
Filling of missing points is running ...
Repr. error in proj. space (no fact./fact.) is ...  29.658066 11.637252
************************************************************
Number of detected outliers:   0
About cameras (Id, 2D reprojection error, #inliers):
CamId    std       mean  #inliers 
  1      8.97     13.85    178 
  2      6.46      8.81    174 
  3     12.18     14.16    230 
  4      8.01      8.37    219 
  5     14.59     12.50    230 
  6     12.40     11.79    215 
***************************************************************
**************************************************************
Refinement by using Bundle Adjustment
Repr. error in proj. space (no fact./fact./BA) is ...  18.370124 4.216238 0.751143
2D reprojection error
All points: mean  0.75 pixels, std is 0.59

finished: result in  /media/benyishay_la/Data/Experiments/Calibration/20221013_124823.braidz.h5.recal/result

and when using any n that causes more than 300 points causes a segmentation fault with 7.2.0.

Anyway, for now, it seems like this is solved just by sticking with Octave 6.4.0. I only had to build it from source, as I could not find it on any available repository for Ubuntu 20.04 - it's either 5.4.0 or 7.2.0, from what I could find.

astraw commented 2 years ago

Thanks for the update. This seems like an Octave bug or an Octave packaging bug to me and not something we can fix within MultiCamSelfCal itself. It would be great if you file an Octave bug report. Their reporting system seems to be at https://savannah.gnu.org/bugs/?group=octave .

elhananby commented 2 years ago

No problem, I'll post something on their board. Thanks for the help.