Closed jxl152 closed 1 year ago
not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.
not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.
Thanks for your reply! @EricLina Thus, it is better to use the corresponding CUDA implementation of the operations such as FPS and KNN, isn't it? @qq456cvb
not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.
In my experiment, the program runs out of GPU memory when running torch.einsum(...) on 2048 points. Thus, I have to decrease the batch size to 8. However, it is not the best solution. We need to use the corresponding CUDA implementation.
Yes, I think it would be the best to have them implemented with a custom CUDA kernel. Maybe you can have a look at https://github.com/erikwijmans/Pointnet2_PyTorch.git, which implements grouping/interpolation with custom CUDA kernel.
@EricLina did you work on S3DIS data with this repo?
@EricLina did you work on S3DIS data with this repo?
Right
@EricLina did you work on S3DIS data with this repo?
Right
Could you please elaborate few things:
Thanks!!
For those who would like to use FPS but are not able to compile the custom CUDA kernel for FPS, you could try the FPS implementation from Deep Graph Library (DGL) here
It is much faster than the pure python FPS implementation from https://github.com/yanx27/Pointnet_Pointnet2_pytorch (~18x faster in my case), and it is still pure python code.
Yes, I think it would be the best to have them implemented with a custom CUDA kernel. Maybe you can have a look at https://github.com/erikwijmans/Pointnet2_PyTorch.git, which implements grouping/interpolation with custom CUDA kernel.
Description: in pointnet_util.py, the functions, such as query_ball_point and farthest_point_sample, are completely implemented in Python without CUDA.
My question: are these functions as efficient as those cuda implementation such as the library pointops?