lxxue / FRNN

Fixed Radius Nearest Neighbor Search on GPU
181 stars 24 forks source link

GPU memory cache #10

Open hugobl1 opened 2 years ago

hugobl1 commented 2 years ago

Hello Ixxue, Thank you very much for this work! I just have a question about your code. I noticed that when I use your code in a for loop, the GPU memory allocated keeps increasing until it reaches an OOM error. Have you noticed this too? Do you know where this could come from?

Example of code:


dists, idxs, nn, grid = frnn.frnn_grid_points(
        points1, points2, lengths1, lengths2, K, r, grid=None, return_nn=False, return_sorted=True
  )

for i in range(n):
    ## Some operations that do not allocate GPU memory
    dists, idxs, nn, grid = frnn.frnn_grid_points(
        points_i, points2, lengths_i, lengths2, K, r, grid=grid, return_nn=False, return_sorted=True
  )
    ## Some operations that do not allocate GPU memory

Where points_i is a new pointcloud at each iteration

lxxue commented 2 years ago

Sorry for the late reply. I just got an idle GPU now.

I did a local test like this (pts.pth is located under FRNN/tests/):

import torch
import frnn

points1 = torch.load("pts.pth")[None, ...].float().cuda()
points2 = torch.load("pts.pth")[None, ...].float().cuda()
print(points1)
n = 1000
points_list = []
lengths_list = []
for i in range(n):
    points_list.append(torch.load("pts.pth")[None, ...].float().cuda())
    lengths_list.append(points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda())

K = 5
r = (points1.amax(dim=1) - points1.amin(dim=1)) / 10
r = r.amax()[None]

lengths_1 = points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda()
lengths_2 = points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda()

dists, idxs, nn, grid = frnn.frnn_grid_points(
    points1, points2, lengths_1, lengths_2, K, r, grid=None, return_nn=False, return_sorted=True
)

for i in range(n):
    ## Some operations that do not allocate GPU memory
    points_i = points_list[i]
    lengths_i = lengths_list[i]
    dists, idxs, nn, grid = frnn.frnn_grid_points(
        points_i, points2, lengths_i, lengths_2, K, r, grid=grid, return_nn=False, return_sorted=True
    )
    ## Some operations that do not allocate GPU memory

print("done")

and my GPU memory usage is stable at 2413MiB. Can you share an example and data that can reproduce the error?

The GPU memory should be released and reallocated in every iteration if you did not store any results. So I am wondering if you are saving the results (e.g. dists) for later use or if there is an iteration with a very large point cloud that cannot fit in your GPU?