mkshing / e4t-diffusion

Implementation of Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
https://arxiv.org/abs/2302.12228
MIT License
317 stars 24 forks source link

Need help... OOM with 2 RTX3090 (bs=2) #25

Open CHR-ray opened 1 year ago

CHR-ray commented 1 year ago

the accelerate yaml file

compute_environment: LOCAL_MACHINE
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp8
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

And my command for tuning

accelerate launch tuning_e4t.py --pretrained_model_name_or_path e4t-diffusion-ffhq-celebahq-v1   --prompt_template "a photo of {placeholder_token}"   --reg_lambda 0.1   --output_dir tune_yann-lecun   --train_image_path "https://engineering.nyu.edu/sites/default/files/styles/square_large_default_1x/public/2018-06/yann-lecun.jpg?h=65172a10&itok=NItwgG8z"   --resolution 512   --train_batch_size 2   --learning_rate 1e-6 --scale_lr   --max_train_steps 30

I think I have 48GB VRAM is enough, because you only use 1 A100 in your paper. But why I still get OOM even with batchsize = 2 ?

zhanjiahui commented 10 months ago

I was able to save a lot of memory by using the Bitsandbytes package and was able to fine-tune batchsize to 16 on a single RTX 3090. Setting --use_8bit_adam at the end of the command.