1.Why clip-vit-large-patch14 is selected for clip_model in stage 2 instead of clip-vit-base-patch32 in stage 1?
2.Why are the configurations of poseguider_checkpoint_path and referencenet_checkpoint_path in stage 2 empty?
3.Why gradient_accumulation_steps is set to 16 in stage 2?In fact, the batch_size setting in the original paper is 4. If I train on 8 A100 GPUs, the total batch_size reache 128?
1.Why clip-vit-large-patch14 is selected for clip_model in stage 2 instead of clip-vit-base-patch32 in stage 1? 2.Why are the configurations of poseguider_checkpoint_path and referencenet_checkpoint_path in stage 2 empty? 3.Why gradient_accumulation_steps is set to 16 in stage 2?In fact, the batch_size setting in the original paper is 4. If I train on 8 A100 GPUs, the total batch_size reache 128?