tarashakhurana / 4d-occ-forecasting

CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'
https://www.cs.cmu.edu/~tkhurana/ff4d/index.html
MIT License
210 stars 22 forks source link

loss drops to 0 #11

Closed Tkk0124 closed 9 months ago

Tkk0124 commented 9 months ago

Hello, I noticed a strange thing, when training stationary scenes (such as waiting for a traffic light), the loss will quickly drop to 0 and lose vitality

placeforyiming commented 9 months ago

Hello, I noticed a strange thing, when training stationary scenes (such as waiting for a traffic light), the loss will quickly drop to 0 and lose vitality

https://github.com/tarashakhurana/4d-occ-forecasting/blob/ff986082cd6ea10e67ab7839bf0e654736b3f4e2/model.py#L251

Checkout here. After go through the code quickly, I found the output residual adding here is a bit weird. The skip connection is better to directly learn the identical mapping of the input for the stationary scanrios.

tarashakhurana commented 9 months ago

Thanks for the insight! Although this sounds like a reasonable thing to do at first (e.g., when the world and the ego-agent is stationary, you cannot infer anything about the environment other than what has been seen in the past), maybe this is sub-optimal for other cases when the ego-agent or the environment is dynamic. Have you guys tried playing with the skip connection?

placeforyiming commented 9 months ago

Thanks for the insight! Although this sounds like a reasonable thing to do at first (e.g., when the world and the ego-agent is stationary, you cannot infer anything about the environment other than what has been seen in the past), maybe this is sub-optimal for other cases when the ego-agent or the environment is dynamic. Have you guys tried playing with the skip connection?

No, I didn't try out the code yet. I will leave a message here if I try something new out.

Coud you help me figure out another question? I think I may miss something in the paper. It's about the result comparision in Table 1. The performance of the S2Net is quite different with the original paper. In the original paper, the L1 is 0.545 for 1s, 0.560 for 3s on nuScenes, but the Table 1 reports 3.49 for 1s and 4.78 for 3s. What is the different in the evaluation settings? Or could you point me out where I can find the information?

tarashakhurana commented 9 months ago

Good catch! The evaluation of both S2Net and SPFNet are quite different from the benchmarking protocol we propose.

  1. Both the methods output a confidence score for each predicted future point. Upon discussion with the authors, we find that this confidence score is used to weigh the per-point metrics (L1 or L2 or Chamfer distance etc.) before taking their average. We proposed a modified version of this evaluation, where there is no confidence weighting (note how you could predict n bad points with 0 confidence and your metrics could still reduce to 0).
  2. Additionally our metrics, excluding vanilla chamfer distance, are computed in the near-field bounded volume.
  3. Specifically for S2Net, the numbers in their paper are Top-5 numbers.

Those are all the differences.

Tkk0124 commented 9 months ago

Thanks for your reply, now I have divided into a lot of sports scenarios and started training