simonzhang00 / ripser-plusplus

Ripser++: GPU-accelerated computation of Vietoris–Rips persistence barcodes
MIT License
98 stars 14 forks source link

Batching and memory issues. #5

Closed danchern97 closed 3 years ago

danchern97 commented 3 years ago

Hi!

I want to calculate persistence barcodes for a large number of matrices, but the only way I can do this right now is to send them one-by-one. Is there no way for batch processing in the lib?

Also, the second issue is concerning GPU memory usage. As I send a lot of matrices one-by-one, it seems like the processes only allocate the GPU memory, but don't free them until all the memory is exhausted.

nihargupte-ph commented 3 years ago

What a coincidence! I was just about to open something similar to this, and I think I've made some progress on logging why it is happening and potentially how to fix it. I am using the python binding version. If I log the memory usage of multiple runs of ripser++ using: https://stackoverflow.com/questions/48317212/pycuda-clean-up-error-cuda-launch-timed-out-error-on-some-machines-only/66147437#66147437, I find that that with each run of ripser++ the amount of free memory decreases and is not freed up automatically. IE in code I get something like this

import ripser_plusplus_python as rpp_py
import subprocess as sp
import os

def get_gpu_memory():
  _output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]

  ACCEPTABLE_AVAILABLE_MEMORY = 1024
  COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
  memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
  memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
  print(memory_free_values)
  return memory_free_values

for i in range(10):
   persistance_dict = rpp_py.run(f'--format point-cloud --dim 1', point_cloud)
   get_gpu_memory()

Outputs something like this:

<RIPSER++ MESSAGE>
[3147]
<RIPSER++ MESSAGE>
[2800]
<RIPSER++ MESSAGE>
[2400]
...

This will continue until there is not enough memory for the program to start it seems. I thought of two ways to fix this one is to open a subprocess and use nvidia-smi command line to clear the memory. But I think that will require sudo privileges and is kind of an inelegant solution. Instead, I think it would be good to open a child process when calling rpp_py.run so when it is finished it will clear the memory. So in code, it looks something like this

import concurrent.futures 

def func(point_cloud):
  persistance_dict = rpp_py.run(f'--format point-cloud --dim 1', point_cloud)
  return persistance_dict

for point_cloud in point_cloud_lst:
  with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
     result = executor.submit(func, point_cloud).result()

When I use this and then track the free GPU memory using the function above the GPU memory doesn't go down per function call. So I think this is a good workaround. If @simonzhang00 thinks this is ok I can maybe open a PR and just wrap rpp_py.run in a child process? Unless maybe its better to spawn the child process in C

danchern97 commented 3 years ago

What a coincidence! I was just about to open something similar to this, and I think I've made some progress on logging why it is happening and potentially how to fix it. I am using the python binding version. If I log the memory usage of multiple runs of ripser++ using: https://stackoverflow.com/questions/48317212/pycuda-clean-up-error-cuda-launch-timed-out-error-on-some-machines-only/66147437#66147437, I find that that with each run of ripser++ the amount of free memory decreases and is not freed up automatically. IE in code I get something like this

import ripser_plusplus_python as rpp_py
import subprocess as sp
import os

def get_gpu_memory():
  _output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]

  ACCEPTABLE_AVAILABLE_MEMORY = 1024
  COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
  memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
  memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
  print(memory_free_values)
  return memory_free_values

for i in range(10):
   persistance_dict = rpp_py.run(f'--format point-cloud --dim 1', point_cloud)
   get_gpu_memory()

Outputs something like this:

<RIPSER++ MESSAGE>
[3147]
<RIPSER++ MESSAGE>
[2800]
<RIPSER++ MESSAGE>
[2400]
...

This will continue until there is not enough memory for the program to start it seems. I thought of two ways to fix this one is to open a subprocess and use nvidia-smi command line to clear the memory. But I think that will require sudo privileges and is kind of an inelegant solution. Instead, I think it would be good to open a child process when calling rpp_py.run so when it is finished it will clear the memory. So in code, it looks something like this

import concurrent.futures 

def func(point_cloud):
  persistance_dict = rpp_py.run(f'--format point-cloud --dim 1', point_cloud)
  return persistance_dict

for point_cloud in point_cloud_lst:
  with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
     result = executor.submit(func, point_cloud).result()

When I use this and then track the free GPU memory using the function above the GPU memory doesn't go down per function call. So I think this is a good workaround. If @simonzhang00 thinks this is ok I can maybe open a PR and just wrap rpp_py.run in a child process? Unless maybe its better to spawn the child process in C

Thanks for the workaround! It works, memory doesn't increase, but it makes computations ~44x slower, so the issue stands still.

simonzhang00 commented 3 years ago

Check out the new commit d518cfbc7bbb5f7270afc07570c591b0a32b65c9 I pushed. It has a script called testmemory.py in the working_directory as contributed by @kauii8school. The GPU memory is freed after every barcode computation now.

danchern97 commented 3 years ago

Thanks, everything works!

davebulaval commented 9 months ago

@simonzhang00 I have the same issue with the "distance" format. It works with the proposed word around, but it is way slower. It seems to keep the memory on hold on the GPU, and since I call it multiple times after some time, I get an out-of-memory.