Having trouble reproducing paper results with provided checkpoints

tarashakhurana / 4d-occ-forecasting

CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'

https://www.cs.cmu.edu/~tkhurana/ff4d/index.html

MIT License

215 stars 24 forks source link

Having trouble reproducing paper results with provided checkpoints #8

Closed LunjunZhang closed 1 year ago

LunjunZhang commented 1 year ago

Hi there,

I downloaded the NuScenes checkpoints (for both 1s and 3s prediction) and ran the evaluation code for the checkpoints. I am having trouble reproducing the reported results in the paper from the checkpoints. More specifically, the reported / reproduced results are:

1s prediction:
- L1: 1.40 / 1.463
- AbsRel: 10.37 / 10.97
- Chamfer Distance Near-Field: 1.41 / 1.4859
- Chamfer Distance Vanilla: 2.81 / 2.9902

3s prediction:
- L1: 1.71 / 1.903
- AbsRel: 13.48 / 15.54
- Chamfer Distance Near-Field: 1.40 / 1.7462
- Chamfer Distance Vanilla: 4.31 / 5.2942

Thanks for your help!

tarashakhurana commented 1 year ago

Hi, the results we report are on the validation set of NuScenes. Is that what you tested the checkpoints on?

LunjunZhang commented 1 year ago

Thanks a lot for the reply!

We tested the checkpoint on the test set of NuScenes, which I think is the default behaviour of test.py in the repo. We will try changing the evaluation dataset to the validation set instead.

Are the numbers for other datasets all done on the validation set (rather than the test set) as well?

tarashakhurana commented 1 year ago

Sorry for the confusion! For nuScenes, the only baselines we had access to (SPF and S2Net) were evaluated on the validation set. Their paper says test set but we confirmed this with the authors. All other datasets should be test sets.

Edit: Another reason for using the validation set for nuScenes was that the LidarSeg labels were not available for the test set, which we used to compute the foreground/background metrics.

Edit 2: Just to be clear, test.py does not have this default behavior. test.sh lists the test set for KITTI. test_fgbg.sh lists the val set for nuScenes. Let me know if the metrics still don't match.

LunjunZhang commented 1 year ago

Thanks so much for the clarification! Results on NuScenes are matching now.