CUDA memory allocation causes crashes?

benlansdell commented 8 years ago

Not sure what the exact cause is, but program currently uses a lot of virtual memory. Seems that each import of a CUDA library requires around 66GB of VM.

Trying other CUDA programs shows similar behaviour, both involving OpenGL interop and not, and both in python and C++.

OpenCV was compiled with CUDA support, so that creates one instance of 66GB of VM. My own import for CUDA use seems to create another instance of 66GB of VM -- so the program currently allocates ~128GB of VM.

This seems to be related to the fact that CUDA shared memory involves allocation of a common address space between host and device, one that is at least the size of the host and device memories combined -- RAM + swap + GPU memory \approx 66GB (see for example: http://stackoverflow.com/questions/11631191/why-does-the-cuda-runtime-reserve-80-gib-virtual-memory-upon-initialization).

While this is really just a trick to allow a common address space and shouldn't eat into any actual memory limits, let alone any resident memory, which would affect performance, allocating 128GB of VM does appear to affect the stability of the program when hard disk space is low. The program sometimes crashes after a varying number of frames in cases when 128GB is allocated and hard disk space is below \approx 128.

The best solution may be to change the system configuration so that cases like this are avoided -- separation of home directories and swap space. Though this needs experimentation.

benlansdell commented 8 years ago

Commits 5d4a8415a625dce054fe1a39f4b6703355ae1dfd and 91420459ed876e9d77a378edb738d21e13b437b0 are attempts at avoiding 66GB allocations before I knew what was going on. They try to avoid this allocation by containing OpenCV usage to a separate process spawned just for OpenCV's use. This process is then killed when the function has run, and the VM is freed. This was maybe going to work because OpenCV calls are relatively rare per frame, so performance wouldn't be affected too much be the process creation/destruction. However, there are two issues with this:

It doesn't help with the CUDA-related code. Since the CUDA code is continually running, 1000s of times per frame, creating a separate process for each CUDA call would be ridiculous... if it would even work at all, as OpenGL doesn't like multithreading
Processes spawned in python inherit the VM allocation size from their parent, which really just exacerbates the problem as any large allocations are just doubled when a new process is created...

benlansdell commented 8 years ago

It's still confusing how CUDA's shared memory VM trick would affect CUDA stability when hard disk space is low. It seems like an awful solution to the shared memory problem if it is indeed the cause of the crashes. More experimentation with the program will hopefully yield some insight

benlansdell commented 8 years ago

The above referenced commits will be reverted

benlansdell commented 8 years ago

After freeing up enough hard drive space for the above to not be an issue, the crashes remain. Will have to add logging to better investigate where the crashes occur...

benlansdell commented 8 years ago

Continuing in #32

hydradarpa / kalman-hydra

CUDA memory allocation causes crashes? #31