Closed MaxChanger closed 2 years ago
Hi, your observation can be all right. I think it is not a bug or a problem, just because of the following reasons:
Thank you for your reply, which allowed me to confirm that the code was officially tested and certified on multiple cards.
Hi, @paul007pl. I tested the official dcp code on 4*2080Ti with batch_size=32 and pytorch1.5.0, it takes about 9.5h to run 100 epochs
As a comparison, I tested the dcp code in benchmark repo on 4*2080Ti with batch_size=16 (due to memory overflow), and it takes about 1h to run 1 epoch. I think the difference between the two is huge and the difference in dataset shouldn't cause such a big difference? I am eager to share the results with you, so I only ran it for a while. But I think it's enough to extrapolate the entire training phase consumption time. (The time required for each iteration is not stable, the shortest time is ~1h to complete an epoch.)
Thanks for your report, and I have imporved its training efficiency now. Please try again~
For DeepGMR, you can further improve its efficiency by using this file: https://github.com/wentaoyuan/deepgmr/blob/master/rri.cu.
Once I have time, I will further improve.
Wow, your improvements are very effective, now the DCP training process can reach an approximate speed to the official one in my environment, thanks a lot!
But after comparing the dcp.py
file, I found that it seems to be just a change in the code structure?
Can you briefly describe the reason for doing this to help me better understand the reason for the efficiency improvement. Or where is the key to improved performance.
Thanks again.
Unfortunately, I do not know the exact reason... I assume it may be caused by the "clone" operation... You are encouraged to do another study, and maybe you can tell us later :)
🤔 Philosophy of life, hahaha. Anyway, thanks again! 🍻
Hello, I tried to test the code for point cloud registration in Benchmark, but I found that the GPU utilization for these methods is very low (jumping between 0% and 20%, and the training process is also very slow), I used multiple GPUs, do you use a single one for the training process or? Can you provide us with your hardware environment and the approximate time required for a full training or an epoch? I'm not sure if it's my environment or code problem, I'd like to get your help.