qq456cvb / Point-Transformers

Point Transformers
MIT License
609 stars 102 forks source link

The functions in pointnet_util.py are completely implemented in Python without cuda? #36

Closed jxl152 closed 1 year ago

jxl152 commented 1 year ago

Description: in pointnet_util.py, the functions, such as query_ball_point and farthest_point_sample, are completely implemented in Python without CUDA.

My question: are these functions as efficient as those cuda implementation such as the library pointops?

EricLina commented 1 year ago

not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.

jxl152 commented 1 year ago

not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.

Thanks for your reply! @EricLina Thus, it is better to use the corresponding CUDA implementation of the operations such as FPS and KNN, isn't it? @qq456cvb

jxl152 commented 1 year ago

not very efficient yet, when using s3dis dataset, the knn operator will make GPU out of memory easily.

In my experiment, the program runs out of GPU memory when running torch.einsum(...) on 2048 points. Thus, I have to decrease the batch size to 8. However, it is not the best solution. We need to use the corresponding CUDA implementation.

qq456cvb commented 1 year ago

Yes, I think it would be the best to have them implemented with a custom CUDA kernel. Maybe you can have a look at https://github.com/erikwijmans/Pointnet2_PyTorch.git, which implements grouping/interpolation with custom CUDA kernel.

aldinorizaldy commented 11 months ago

@EricLina did you work on S3DIS data with this repo?

EricLina commented 11 months ago

@EricLina did you work on S3DIS data with this repo?

Right

aldinorizaldy commented 11 months ago

@EricLina did you work on S3DIS data with this repo?

Right

Could you please elaborate few things:

  1. Did you use the codes here directly for S3DIS or you have made some changes?
  2. Which model did you use? The Hengshuang?
  3. Did you get the similar accuracies with the ones reported on the paper?

Thanks!!

EricLina commented 10 months ago
  1. I did not change the code, if you can't run it up, you would better to check your dataset preparations.
  2. Yeah, there are two models using the same name, Hengshuang's and Tsinghua's. I used Hengshaung's model.
  3. using qqcvb's code, I got 69.84 mIoU, (training 48h on single A30, batch_size 2).
aldinorizaldy commented 8 months ago

For those who would like to use FPS but are not able to compile the custom CUDA kernel for FPS, you could try the FPS implementation from Deep Graph Library (DGL) here

It is much faster than the pure python FPS implementation from https://github.com/yanx27/Pointnet_Pointnet2_pytorch (~18x faster in my case), and it is still pure python code.

Yes, I think it would be the best to have them implemented with a custom CUDA kernel. Maybe you can have a look at https://github.com/erikwijmans/Pointnet2_PyTorch.git, which implements grouping/interpolation with custom CUDA kernel.