ztsrxh / RoadBEV

Codes for RoadBEV: road surface reconstruction in Bird's Eye View
MIT License
131 stars 12 forks source link

test: abs_err:nan, rmse:nan, >0.5cm:nan #3

Closed DGCHAO closed 2 months ago

DGCHAO commented 2 months ago

训练时的评估和测试时结果都是nan,我已经生成了gt数据。

ztsrxh commented 2 months ago

Thanks for your interest. We have provided the GT labels generated in our environment. You can use it for training thus checking if problems still exist. Nan often occurs with large LR. In our settings, the LR for RoadBEV-mono is 8e-4 while 5e-4 for stereo. You can also disable the AMP and train with FP32.

DGCHAO commented 2 months ago

感谢您的关注。我们提供了在我们的环境中生成的 GT 标签。您可以使用它进行训练,从而检查问题是否仍然存在。 Nan 经常与大 LR 一起出现。在我们的设置中,RoadBEV 单声道的 LR 为 8e-4,而立体声的 LR 为 5e-4。您还可以禁用 AMP 并使用 FP32 进行训练。

Thank you for your patience, but I still can't solve the problem by reducing the learning rate.

I set it up as follows: python train.py --lr 8e-5 trining RoadBEV-mono! dataset size - train:1210, test371 num params: 26646334 logging dir: ./checkpoints/20240418024711 0%| | 0/20 [00:00<?, ?it/s]train--> epoch 1, lr:0.000017, loss:4.9908 train--> epoch 1, lr:0.000029, loss:4.7834 train--> epoch 1, lr:0.000042, loss:4.6474 RoadBEV/utils/metric.py:43: RuntimeWarning: invalid value encountered in true_divide metric_wise = self.metric_wise / self.count_wise.reshape(-1, 1) test: abs_err:nan, rmse:nan, >0.5cm:nan train--> epoch 1, lr:0.000055, loss:4.6511 train--> epoch 1, lr:0.000067, loss:4.4105 train--> epoch 1, lr:0.000080, loss:4.3248 test: abs_err:nan, rmse:nan, >0.5cm:nan train--> epoch 1, lr:0.000080, loss:4.3769 train--> epoch 1, lr:0.000079, loss:3.9525

DGCHAO commented 2 months ago

@ztsrxh I found that the problem was caused by the generated GT data. Using proprecess_gt.py to generate data for testing is nan. But using the data you sent works normally.

ztsrxh commented 2 months ago

It is great that you find the problem. Maybe nan is caused due to different open3d versions. We are also reviewing codes to localize potential bugs.

DGCHAO commented 2 months ago

My open3d==0.17.0, I will try the 0.16 version later. Thanks.