Closed wangherr closed 3 months ago
Hi,
I’m interested in knowing the GPU hours used for training the code. Could you please provide this information?
Thank you!
The training time is around 7.5 hours for 8 A100 GPUs , when training 5000 iterations with a batch size 256 (4 gradient_accumulation_steps)
In the paper,there are two stage: Stage 1: "We first fine-tune the pre-trained ControlNet until convergence using a batch size of 256 and a learning rate of 1e-5."
Stage 2: "We then employ the same batch size and learning rate for 5k iterations for reward fine-tuning."
From my understanding, Stage 2 costs about 60 GPU hours, while Stage 1 takes very little time and can be considered negligible. Is this correct?
In the paper,there are two stage: Stage 1: "We first fine-tune the pre-trained ControlNet until convergence using a batch size of 256 and a learning rate of 1e-5."
Stage 2: "We then employ the same batch size and learning rate for 5k iterations for reward fine-tuning."
From my understanding, Stage 2 costs about 60 GPU hours, while Stage 1 takes very little time and can be considered negligible. Is this correct?
Yes, you're right.
Hi,
I’m interested in knowing the GPU hours used for training the code. Could you please provide this information?
Thank you!