Training time cost - Githubissues

opendilab / SmartRefine

[CVPR 2024] SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Apache License 2.0

102 stars 8 forks source link

Training time cost #12

Open Liwen-Xiao opened 1 month ago

Liwen-Xiao commented 1 month ago

Hello! Great job! Could you tell me how long your training time is? On my machine, using a single 3090 GPU, the first epoch shows it will take around 8 hours to train. Is this similar to your experience?

youngzhou1999 commented 1 month ago

Hi. Thanks for your interest. The first epoch is relatively longer as it will process data for training, you can reduce the radius of map retrieval to fasten your training process.

Liwen-Xiao commented 1 month ago

Thank you for your reply! I changed the radius and the training cost decreased a lot. Thanks again!

lon0862 commented 1 month ago

Hi, but I find my training cost is take around 8 hours in each epoch, not only the first epoch. Is this similar to your experience?

Liwen-Xiao commented 1 month ago

Hi, but I find my training cost is take around 8 hours in each epoch, not only the first epoch. Is this similar to your experience?

Yes, it is the same. I changed the local radius to a smaller one and solved the problem.

lon0862 commented 1 month ago

Yes, it is the same. I changed the local radius to a smaller one and solved the problem.

Thanks, I do the same thing to solved it, too.

lon0862 commented 3 weeks ago

@Liwen-Xiao Hi, I want to ask about the retrained result. I use 1 GPU, batch_size: 32, accumulate by 2 batches。Final only get val_minADE: 0.6525, val_minFDE: 0.9325, val_MR: 0.0880。And in preprocess data, I use local_radius:65。 I want to know have you get the result same as github checkpoints?

Liwen-Xiao commented 3 weeks ago

Hi, I also retrained the model. I got val_minFDE as 0.919, which is close to 0.913 as the author reported. I use 1 GPU, batch size is 32, and the epoch is 64. I get the best result at the 33rd epoch.

lon0862 commented 3 weeks ago

Hi, I use the same parameter as you show. And I get the best result at the 31st epoch. which val_minADE is 0.649, val_minFDE:0.924, val_MR:0.086. It is a little worse than yours and author reported, but I think it is acceptable. Thanks!

lon0862 commented 3 weeks ago

But I still confuse about the quality score using, do you try it as paper says?