jiuntian / interactdiffusion

[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
https://jiuntian.github.io/interactdiffusion/
105 stars 10 forks source link

OUT OF MEMORY #7

Open BlackPuuuudding opened 7 months ago

BlackPuuuudding commented 7 months ago

Why does my training run out of memory when the batch size is set to 4, and out of memory when the batch size is set to 2 during multi-GPU training, yet the paper is able to set it to 8? I'm using the same device as the one mentioned in the paper, which is a 4090, and the cpkt is SD 1.4 and interact-diffusion-v1-1.pth. Thank you!!

jiuntian commented 7 months ago

We use this command for training:

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt <existing_gligen_checkpoint> --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name <existing SD v1.4/v1.5 checkpoint>

We use AMP, batch size is set to 4 for each GPU.

Training detail at readme