Closed benlansdell closed 8 years ago
Commits 5d4a8415a625dce054fe1a39f4b6703355ae1dfd and 91420459ed876e9d77a378edb738d21e13b437b0 are attempts at avoiding 66GB allocations before I knew what was going on. They try to avoid this allocation by containing OpenCV usage to a separate process spawned just for OpenCV's use. This process is then killed when the function has run, and the VM is freed. This was maybe going to work because OpenCV calls are relatively rare per frame, so performance wouldn't be affected too much be the process creation/destruction. However, there are two issues with this:
It's still confusing how CUDA's shared memory VM trick would affect CUDA stability when hard disk space is low. It seems like an awful solution to the shared memory problem if it is indeed the cause of the crashes. More experimentation with the program will hopefully yield some insight
The above referenced commits will be reverted
After freeing up enough hard drive space for the above to not be an issue, the crashes remain. Will have to add logging to better investigate where the crashes occur...
Continuing in #32
Not sure what the exact cause is, but program currently uses a lot of virtual memory. Seems that each import of a CUDA library requires around 66GB of VM.
Trying other CUDA programs shows similar behaviour, both involving OpenGL interop and not, and both in python and C++.
OpenCV was compiled with CUDA support, so that creates one instance of 66GB of VM. My own import for CUDA use seems to create another instance of 66GB of VM -- so the program currently allocates ~128GB of VM.
This seems to be related to the fact that CUDA shared memory involves allocation of a common address space between host and device, one that is at least the size of the host and device memories combined -- RAM + swap + GPU memory \approx 66GB (see for example: http://stackoverflow.com/questions/11631191/why-does-the-cuda-runtime-reserve-80-gib-virtual-memory-upon-initialization).
While this is really just a trick to allow a common address space and shouldn't eat into any actual memory limits, let alone any resident memory, which would affect performance, allocating 128GB of VM does appear to affect the stability of the program when hard disk space is low. The program sometimes crashes after a varying number of frames in cases when 128GB is allocated and hard disk space is below \approx 128.
The best solution may be to change the system configuration so that cases like this are avoided -- separation of home directories and swap space. Though this needs experimentation.