About the performance in the paper

zju3dv / NeuralRecon

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

https://zju3dv.github.io/neuralrecon/

Apache License 2.0

2.03k stars 294 forks source link

About the performance in the paper #71

Open weihaosky opened 2 years ago

weihaosky commented 2 years ago

Hi, I try to train the model from scratch but the results in the paper cannot be reproduced. I trained the network on 8 GTX 2080 Ti with batch size of 1 on each GPU. Then I tested the results of models trained after 25, 30, 35, 40, 45, 50, 60, 70 epochs. The best model only gets a performance of abe_rel=0.068 and fscore=0.482, which is far from the results in your paper.

Also, the results from your released model (47 epochs) is: AbsRel 0.065 AbsDiff 0.099 SqRel 0.038 RMSE 0.197 LogRMSE 0.113 r1 0.932 r2 0.961 r3 0.975 complete 0.892 dist1 0.053 dist2 0.135 prec 0.687 recal 0.471 fscore 0.557 which is the same as that reported by @ZuoJiaxing in #53 while different from that in the paper.

May I ask the training setup of yours? e.g. the batch size, the numbers of GPUs, the learning rate Many thanks!

MaximilianKummeth commented 2 years ago

Hey, may I ask how long the training took with the 8 GPUs?

weihaosky commented 2 years ago

Hey, may I ask how long the training took with the 8 GPUs?

@MaximilianKummeth About 2 days

HaFred commented 2 years ago

Hi, did you try the phase 1 as mentioned in the readme? How will it affect the training performance?

weihaosky commented 2 years ago

Hi, did you try the phase 1 as mentioned in the readme? How will it affect the training performance?

Yes, I train the phase1 exactly as the instruction in readme. I guess phase1 is important.

HaFred commented 2 years ago

Thank you for your kind reply. Just wondering, did you run into errors below for phase 1?

| WARNING  | models.neucon_network:compute_loss:242 - target: no valid voxel when computing loss
...
/python3.7/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32])

I am using RTX3090 and seems like I can only get it running for phase 2, but not phase 1...

weihaosky commented 2 years ago

Thank you for your kind reply. Just wondering, did you run into errors below for phase 1?
| WARNING  | models.neucon_network:compute_loss:242 - target: no valid voxel when computing loss
...
/python3.7/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32])
I am using RTX3090 and seems like I can only get it running for phase 2, but not phase 1...

I forgot. But it looks familiar.

gong-xuan commented 2 years ago

I cannot get the reported results either. @weihaosky similar as yours. Anyone figured out the reason?