tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.5k stars 296 forks source link

SDXL training images resolution #256

Open cpis7 opened 5 months ago

cpis7 commented 5 months ago

Thank you for your great contribution! I have a question about the training on SDXL model. Did you use 1024x1024 images for training on SDXL ipadapter model since pretrained SDXL model was trained with 1024x1024 images? or did you use 512x512 images? Thank you

xiaohu2015 commented 5 months ago

Firstly, we perform pre-training at a resolution of 512x512. Then, we employ a multi-scale strategy for fine-tuning.

cpis7 commented 5 months ago

Thank you for answering! can you please tell me which scales did you use? did you use multi-scale strategy noted on sdxl paper?

image
xiaohu2015 commented 5 months ago

3 stages: (1) 512x512 (2) buckets = [ [768, 768], [960, 640], [640, 960], [768, 896], [896, 768], [768, 832], [832, 768], [768, 960], [960, 768], [768, 1024], [1024, 768], [704, 1024], [1024, 704], [1024, 640], [640, 1024],

]

(3) buckets = [ [1024, 1024], [768, 1280], [1280, 768], [832, 1216], [1126, 832], [832, 1152], [896, 1152], [1152, 896], [1152, 832], [960, 1088], [1088, 960], [896, 1088], [1088, 896], [960, 1024], [1024, 960] ]

masaisai111 commented 2 months ago

How much memory do you use when training with SDXL

xiaohu2015 commented 2 months ago

using 40GB 8xA100s, batch size can be 8*8 at 512x512

masaisai111 commented 2 months ago

If I train with 256, will it be much less effective

xiaohu2015 commented 2 months ago

If I train with 256, will it be much less effective

I don't test, but I think you should train at 512+

masaisai111 commented 2 months ago

Why do I reduce the size of the image, but the memory footprint and no significant decline, even though I - resolution = 8 -- he will error torch train_batch_size = 1.

Cuda. OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 23.70 GiB total capacity; 21.79 GiB already allocated; 7.56 MiB free; 22.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

eyalgutflaish commented 2 weeks ago

SDXL was trained on 1024 so how did you train on 512 images, which is out of distribution. Did you add a lora to the unet to be able to support low res images ? If so did you train ip-adapter+lora together ?

xiaohu2015 commented 2 weeks ago

SDXL was trained on 1024 so how did you train on 512 images, which is out of distribution. Did you add a lora to the unet to be able to support low res images ? If so did you train ip-adapter+lora together ?

pretrain at 512x512 and the fintune at 1024x1024