mapillary / OpenSfM

Open source Structure-from-Motion pipeline
https://www.opensfm.org/
BSD 2-Clause "Simplified" License
3.39k stars 859 forks source link

Large memory allocation - job quits #177

Closed hblanken closed 3 years ago

hblanken commented 7 years ago

Hi team - I have been doing some testing with drone imagery, 219 images. They usually stitched really well, up until ODM installed the latest OpenSfM. Seems like tons of memory is being allocated to depthmap and cleaning maps. Then the job quits.

The maintainers of OpenDroneMap repo suggested I shall open an issue here. Hope this helps bug fixing. All details are attached. https://github.com/OpenDroneMap/OpenDroneMap/issues/562 Would anyone be able to help?

paulinus commented 7 years ago

Thanks for reporting here. We need to investigate since the cleaning step should not use much memory. The merging should though.

The main change on the last ODM update is that multiple depthmaps are computed in parallel. If that is what is causing the problem, you can try running with only one process.

hblanken commented 7 years ago

@paulinus I really do not know whether this is a ODM or Opensfm or ubuntu issue. I have now set the opensfm processes to 1 and the depthmaps are being created. However the project exits at the cleaning depthmap stage. Here is the log: 2017-05-15 14:21:10,116 Cleaning depthmap for image DJI_0829.JPG Traceback (most recent call last): File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/bin/opensfm", line 34, in command.run(args) File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/commands/compute_depthmaps.py", line 25, in run dense.compute_depthmaps(data, graph, reconstructions[0]) File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 38, in compute_depthmaps parallel_run(clean_depthmap, arguments, processes) File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 47, in parallel_run return [function(arg) for arg in arguments] File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 109, in clean_depthmap add_views_to_depth_cleaner(data, reconstruction, neighbors[shot.id], dc) File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 189, in add_views_to_depth_cleaner depth, plane, score = data.load_raw_depthmap(shot.id) File "/home/ubuntu/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dataset.py", line 99, in load_raw_depthmap o = np.load(self._depthmap_file(image, 'raw.npz')) File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 392, in load fid.seek(-N, 1) # back-up IOError: [Errno 22] Invalid argument Traceback (most recent call last): File "run.py", line 46, in plasm.execute(niter=1) File "/home/ubuntu/OpenDroneMap/scripts/opensfm.py", line 128, in process (context.pyopencv_path, context.opensfm_path, tree.opensfm)) File "/home/ubuntu/OpenDroneMap/opendm/system.py", line 28, in run raise Exception("Child returned {}".format(retcode)) Exception: Child returned 1

paulinus commented 7 years ago

This particular error seems to be about a corrupted depthmap file. Maybe it got erroneously stored when the app was crashing. Can try deleting the opensfm/depthmaps folder and re-running?

For the original memory problem. I'll need some time to investigate.

fredlllll commented 7 years ago

i also experienced some extreme ram usage. it looks like there is one process started for every task and they all wait for their turn to run. the problem is, those that are finished just sit there till all the others are done, with full ram usage here after doing their job http://i.imgur.com/Ulh6dcP.png

hblanken commented 7 years ago

@paulinus thanks - I deleted the entire project and started again - with running ONE openSfM process only. It took a quite some time, but this worked. Maximum memory used was about 6-8GB. Still a lot of orphan processes as @fredlllll pointed out, but the job completed.

xialang2012 commented 7 years ago

I also experienced this problem,

2017-05-18 21:11:26,140 Computing depthmap for image DJI_0253.JPG Traceback (most recent call last): File "/home/xl/odm-new/OpenDroneMap/SuperBuild/src/opensfm/bin/opensfm", line 34, in command.run(args) File "/home/xl/odm-new/OpenDroneMap/SuperBuild/src/opensfm/opensfm/commands/compute_depthmaps.py", line 25, in run dense.compute_depthmaps(data, graph, reconstructions[0]) File "/home/xl/odm-new/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 31, in compute_depthmaps parallel_run(compute_depthmap, arguments, processes) File "/home/xl/odm-new/OpenDroneMap/SuperBuild/src/opensfm/opensfm/dense.py", line 50, in parallel_run return p.map(function, arguments) File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get raise self._value MemoryError Process PoolWorker-10: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Traceback (most recent call last): File "/home/xl/odm-new/OpenDroneMap/run.py", line 55, in plasm.execute(niter=1) File "/home/xl/odm-new/OpenDroneMap/scripts/opensfm.py", line 85, in process (context.pyopencv_path, context.opensfm_path, tree.opensfm)) File "/home/xl/odm-new/OpenDroneMap/opendm/system.py", line 28, in run raise Exception("Child returned {}".format(retcode)) Exception: Child returned 1

The number of input images is 263, and 16 GB RAM is given to the Ubuntu.

paulinus commented 7 years ago

This seems to be related to a problem with OpenCV combined with multiprocessing. For some reason, processes hang after finishing their task. The accumulation of process ends up consuming all memory.

Here is the related OpenCV issue were some workarounds are proposed https://github.com/opencv/opencv/issues/5150

tteke commented 7 years ago

I also keep encountering this error in Opendronemap. # of input images is 371, I have 64 GB of Ram and 128 GB of swap storage and the opensfm process manages to fill it all and get stuck eventually. However, what I realised is that, this problem only occurs when compute_depthmaps is used with multiprocessing. Finding features and matching features works without a hitch with multiprocessing.

I also tried the workarounds proposed in opencv/opencv#5150. They did not work as I believe this is a problem related to the compute_depthmaps.

fredlllll commented 7 years ago

i have a proposal for a fix:

looking at the line where opensfm creates the pool

https://github.com/mapillary/OpenSfM/blob/0a5a6f07b66088e9b63bddaf29bb8f49dd39f9e4/opensfm/dense.py#L68

we see that currently only the amount of processes is passed. But there is another parameter that could help. "maxtasksperchild":

https://github.com/python/cpython/blob/5084ff7ddfe68969d95af12075016f309718aea8/Lib/multiprocessing/pool.py#L138

looking at the point where it is used:

https://github.com/python/cpython/blob/5084ff7ddfe68969d95af12075016f309718aea8/Lib/multiprocessing/pool.py#L100

it means that a worker is only executing x tasks before exiting. investigating the rest of the Pool class we can see that it is always going to restart workers till it is completed.

so if we set that value to 1, the workers are always exiting after one task, which could potentially save us from the hanging processes.

alternatively, we could just force exit the process in the function that is doing the computation of the depthmap

https://github.com/mapillary/OpenSfM/blob/0a5a6f07b66088e9b63bddaf29bb8f49dd39f9e4/opensfm/dense.py#L75 (at end of this method)

//dont know what implications a custom exit has. lets hope the hack with 1 task per worker works

feedback please

pierotofy commented 7 years ago

I don't think this will work. See #190.

fredlllll commented 7 years ago

that only means that a custom exit will not work, but shouldnt the use of maxtasksperchild one work?

YanNoun commented 3 years ago

OpenSfM now use multithreading instead of multiprocessing which is saving quite some RAM.