jchengai / forecast-mae

[ICCV'2023] Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
https://arxiv.org/pdf/2308.09882.pdf
154 stars 16 forks source link

About the “multiagent” #15

Closed kelen1 closed 4 months ago

kelen1 commented 4 months ago

Hello! Thanks for your great work! Regarding the details of the work, do you consider all the vehicles in a scene to be agents in the part of multi-agent trajectory prediction?

jchengai commented 4 months ago

Yes, all agents in the scene are used for training. However, only scored agents are used to determine the “best mode” during training.

kelen1 commented 4 months ago

Yes, I see. Thanks! But I have a another question. When masking, the paper says that complementary random masking is used, but I read your code and can't see how it is implemented.

 x[:, 50:] = torch.where(
            (padding_mask[:, 49].unsqueeze(-1) | padding_mask[:, 50:]).unsqueeze(-1),
            torch.zeros(num_nodes, 40, 2),
            x[:, 50:] - x[:, 49].unsqueeze(-2),

If the vehicle trajectory is longer than 50, say 53, then padding_mask[:, 50:] should have 3 False values, do they count as padding or not?

jchengai commented 4 months ago

please check here

https://github.com/jchengai/forecast-mae/blob/cb86ea92601d23a8af7713389e9fb78d7e546a65/src/model/model_mae.py#L111-L113

kelen1 commented 4 months ago

Oh, I see it. But if I want to train from scratch(using 'model_forecast.py'), how to show it? I didn't find the function 'agent_random_masking' in that file.

jchengai commented 4 months ago

Markings are only used for pre-training. To train from scratch, you don't need them. The only difference between training from scratch and fine-tuning is whether you use the pre-trained weights to initialize the model.

Here is the command for train from scratch

python3 train.py data_root=/path/to/data_root model=model_forecast gpus=4 batch_size=32 monitor=val_minFDE
kelen1 commented 4 months ago

I'm sorry, but I still have a little doubt. So,, if it's training from scratch, how does complementary random masking work?

jchengai commented 4 months ago

It is a two-phase framework

First phase: we train the MAE model (model_mae.py) with reconstructing pretext task. Here is where the masking happens. After trained, we throw away the MAE decoder and use the trained weights (e.g., history encoder, map encoder, Transformer encoder) to initialize the forecast model (model_forecast.py).

Second phase: we train the initialized forecast model for the prediction task.

Training from scratch means that we don't apply the first phase and randomly initialize the forecast model as we normally do.

kelen1 commented 4 months ago

I finally got it, thanks for your answer!!