uber-research / LaneGCN

[ECCV2020 Oral] Learning Lane Graph Representations for Motion Forecasting
https://arxiv.org/abs/2007.13732
Other
496 stars 131 forks source link

Training details #3

Closed os1a closed 3 years ago

os1a commented 3 years ago

Hi, According to the paper section 4.1 (implementation details), you use a batch size of 128 and train for 36 epochs with a learning rate 0.001 and decayed at 32 to 0.0001.

According to the provided code, the batch size is 32: https://github.com/uber-research/LaneGCN/blob/7e9b51d18a62f5dafb67a2215eba9053a64aff16/lanegcn.py#L50

Does it give the same performance?

Also one more question about the loss function, can you give more insights for the classification loss? why do you need it, and have you tried training without it?

Thanks a lot for the great work.

chenyuntc commented 3 years ago
  1. For the batch-size, since we use horovod (distributed training) with 4 GPUs, so the batch-size is 32*4. I remember trained in single gpu with batch-size 32, the performance was a bit down, but in a very small margin.

  2. For the classification branch it's used to rank predicted trajectories, e.g., when calculating ADE1, we chose the trajectory with the highest score. Besides, we use max-margin loss to encourage multi-modal prediction.

os1a commented 3 years ago

Thanks a lot for your answers.

Could you please elaborate more why the max-margin loss will encourage multi-modal prediction? I could not find more details about that in the paper.

chenyuntc commented 3 years ago

If we use binary cross-entropy loss, the trajectory is far away from the ground truth would be considered as negative and suppressed to be of zero likelihood. But for the max-margin loss, we only ask it to have a score at least ϵ smaller than the most-close trajectory.

But we don't have the ablation of BCELoss VS MaxMaginLoss, this is a design choice. you could also refers to section 3.3 of this motion planing paper for more info about maxmargin loss

os1a commented 3 years ago

Thanks a lot for your explanation.

So the regression loss you are using is the WTA (winner takes all) loss which penalizes only the best hypothesis. I am wondering if you have tried training your approach with only the regression loss which already ensures diversity among the hypotheses?

chenyuntc commented 3 years ago

Sorry, I missed you last issues. Hope it's not too late.

So the regression loss you are using is the WTA (winner takes all) loss which penalizes only the best hypothesis

Right

I am wondering if you have tried training your approach with only the regression loss which already ensures diversity among the hypotheses?

Sorry, we didn't tried that

chenyuntc commented 3 years ago

I'll close it for now. Feel free to reopen it if you still have questions.