haoyu94 / Coarse-to-fine-correspondences

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"
124 stars 11 forks source link

How to increase batch_size? #1

Closed Ruye-aa closed 2 years ago

Ruye-aa commented 2 years ago

Thank you for your outstanding work, I have some questions when running the code.

When I successfully run the program, I found that an epoch takes a long time to run. I tried to increase the batchsize,and change the batchsize from 1 to 2. But I encountered a bug here: File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) AssertionError: Caught AssertionError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/aiyang/Coarse-to-fine-correspondences/model/KPConv/preprocessing.py", line 72, in collate_fn_descriptor assert len(list_data) == 1 AssertionError

I wanna to know if there is any way to solve this problem, or if you have any way to increase the training speed during training. Thanks

haoyu94 commented 2 years ago

Hi, there,

Thanks for your interest in our work! For the batch size, original KPConv implementation supports batch_size > 1. However, as we use attention modules on the bottleneck and the number of patches there is not the same for different frame pairs, implementing this part in a batch > 1 could be difficult. To improve the time during training, I would give the listed suggestions:

  1. Use the same number of patches for different frame pairs, such that you can use a batch_size > 1;
  2. Use more GPUs, e.g., 4 GPUs each with a batch size of 1;
  3. The calculation of ground truth overlap ratio between patches can be further optimized;
  4. As the data processing in KPConv heavily relies on CPU, if you train the model on a server where CPU is sliced, this part would be a bottleneck. You can try this: https://github.com/qinzheng93/Easy-KPConv, which moves the CPU-based operations onto GPUs.

Best,

Hao