About the training config.

zju3dv / NeuralRecon

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

https://zju3dv.github.io/neuralrecon/

Apache License 2.0

2.03k stars 294 forks source link

About the training config. #111

Open SwingWillwow opened 2 years ago

SwingWillwow commented 2 years ago

First of all, thx for your excellent work! However, I can not reproduce the reported performance. I follow the training config in the official training script and train this model with two RTX3090. The result is: AbsRel 0.075 AbsDiff 0.131 SqRel 0.037 LogRMSE 0.121 r1 0.926 r2 0.961 r3 0.974 complete 0.864 dist1 0.075 dist2 0.189 prec 0.482 recal 0.297 fscore 0.365

The prec, recall and fscore have a big gap between your released pretrained model. I think this might be due to the difference in equipment and training setting. So, may you provide the detailed training setting (e.g., batch_size, learning rate, epoch) and detailed equipment list(e.g. RTX2080 or Tesla V100)? It will have great help to my research! Looking forward to your reply!

SwingWillwow commented 2 years ago

I conduct another experiment with 8 A6000 and set batch_size to 2. The result is much closer to the reported one this time (while still having a gap). The result is: AbsRel 0.067 AbsDiff 0.103 SqRel 0.038 RMSE 0.199 LogRMSE 0.113 r1 0.935 r2 0.962 r3 0.974 complete 0.901 dist1 0.061 dist2 0.135 prec 0.635 recal 0.431 fscore 0.512

So, a potential way to further improve the performance is to use a larger batch_size and use more GPUs. If I get any further improvement, I will report on this issue. Still looking forward to an official training config now.

HLinChen commented 2 years ago

Do you try to train with batch_size=1? I see the default setting is batch_size=1.

SwingWillwow commented 2 years ago

Do you try to train with batch_size=1? I see the default setting is batch_size=1.

Previous issues have noted that using 8 RTX2080ti with batch_size=1 gets worse performance. I try to set batch_size = 4 and train the model with 8*A6000. The result is closer to the reported one this time. Specifically, AbsRel 0.064 AbsDiff 0.099 SqRel 0.037 RMSE 0.195 LogRMSE 0.112 r1 0.935 r2 0.963 r3 0.976 complete 0.897 dist1 0.057 dist2 0.131 prec 0.668 recal 0.460 fscore 0.542