Open hugobl1 opened 2 years ago
Sorry for the late reply. I just got an idle GPU now.
I did a local test like this (pts.pth
is located under FRNN/tests/
):
import torch
import frnn
points1 = torch.load("pts.pth")[None, ...].float().cuda()
points2 = torch.load("pts.pth")[None, ...].float().cuda()
print(points1)
n = 1000
points_list = []
lengths_list = []
for i in range(n):
points_list.append(torch.load("pts.pth")[None, ...].float().cuda())
lengths_list.append(points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda())
K = 5
r = (points1.amax(dim=1) - points1.amin(dim=1)) / 10
r = r.amax()[None]
lengths_1 = points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda()
lengths_2 = points1.shape[1] * torch.ones((1,), dtype=torch.long).cuda()
dists, idxs, nn, grid = frnn.frnn_grid_points(
points1, points2, lengths_1, lengths_2, K, r, grid=None, return_nn=False, return_sorted=True
)
for i in range(n):
## Some operations that do not allocate GPU memory
points_i = points_list[i]
lengths_i = lengths_list[i]
dists, idxs, nn, grid = frnn.frnn_grid_points(
points_i, points2, lengths_i, lengths_2, K, r, grid=grid, return_nn=False, return_sorted=True
)
## Some operations that do not allocate GPU memory
print("done")
and my GPU memory usage is stable at 2413MiB. Can you share an example and data that can reproduce the error?
The GPU memory should be released and reallocated in every iteration if you did not store any results. So I am wondering if you are saving the results (e.g. dists
) for later use or if there is an iteration with a very large point cloud that cannot fit in your GPU?
I met a similar issue, and here is my solution. However, I'm not sure if this modification alters its original logic or if it might lead to other errors. @lxxue Would you mind take a look? Thanks a lot.
Hi, thank you for pointing this out. Could you provide a minimal example to reproduce this memory leak issue so I can check if it fixes the issue?
Here is the test code. I have tested it on both the RTX 4090 and the RTX 3070 Laptop GPU, and I observed similar out-of-memory (OOM) issues on both. I hope you are able to reproduce this issue as well.
'''
A toy example to demonstrate OOM
'''
import numpy as np
import os
import torch
import torch.nn as nn
import frnn
from tqdm import tqdm
def sample_unitsphere(P):
ans = np.empty([P,3],dtype='float32')
rnd_ = np.random.uniform(size=[P,2])
phi = 2 * np.pi * rnd_[:,0]
costheta = rnd_[:,1] * 2. - 1.
sintheta = np.clip( 1 - costheta * costheta, 0., None)
sintheta = np.sqrt(sintheta)
ans[:,2] = costheta
ans[:,0] = sintheta * np.cos(phi)
ans[:,1] = sintheta * np.sin(phi)
return ans
class Nearest_Search:
def __init__(self, K=4, r=0.05, device="cuda:0"):
self.K = K
self.r = r
self.grid = None
self.device = device
def build(self):
opt_points = np.random.rand(70_000, 3).astype(np.float32)
opt_points = opt_points * 2. - 1. # map to [-1,1]
opt_points = torch.from_numpy(opt_points).cuda()
self.opt_points = nn.Parameter(opt_points)
l = [
{'params': [self.opt_points], 'lr': 0.01}
]
self.optimizer = torch.optim.Adam(l)
ref_points = sample_unitsphere(10_000).astype(np.float32)
ref_points = torch.from_numpy(ref_points).cuda()
self.ref_points = ref_points
def run(self):
self.optimizer.zero_grad()
# In my case, I don't need dists
# I only use idxs
dists, idxs, nn, grid = frnn.frnn_grid_points(
self.opt_points.unsqueeze(0), # 1 x P x 3
self.ref_points.unsqueeze(0), # 1 x N_vertex x 3
None, None,
self.K, self.r, grid=self.grid,
return_nn=False, return_sorted=True
)
idxs_ = idxs[0]
ref_points = self.ref_points[idxs_].mean(dim=1) # (Px4)x3 -> Px3
diff = (self.opt_points - ref_points)
loss = (diff * diff).mean()
loss.backward()
self.dists = dists
self.idxs = idxs
# keep grid to avoid unnecessary computing
self.grid = grid
self.optimizer.step()
def train():
nearest_search = Nearest_Search()
nearest_search.build()
n_iters = 40_000
for ii in tqdm(range(n_iters)):
nearest_search.run()
if __name__ == "__main__":
train()
Hello Ixxue, Thank you very much for this work! I just have a question about your code. I noticed that when I use your code in a for loop, the GPU memory allocated keeps increasing until it reaches an OOM error. Have you noticed this too? Do you know where this could come from?
Example of code:
Where points_i is a new pointcloud at each iteration