jianjieluo / SCD-Net

[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.
https://arxiv.org/abs/2212.03099
Other
57 stars 5 forks source link

How long does it take to finish the first stage of training? #4

Closed Salomeeeee closed 1 year ago

Salomeeeee commented 1 year ago

Thanks for your excellent work and codes! I tried to retrain the model on my own dataset (174,350 samples for training), and it took 3-4 days to finish merely 5 epochs... Is that normal? Could you tell me how long it took to finish the first stage of training in your study?

jianjieluo commented 1 year ago

Hi, @Salomeeeee,

Sorry for the late response. Here is my roughly training time(4 P40 GPUs) on COCO dataset for reference.

stage1 XE: ~14h
stage1 RL: ~24h
stage2 XE: ~17h
stage2 RL: ~25h

You might need to ensure to use GPUs or enlarge your batch size to accelerate your training.

Best, Jianjie