sangminkim-99 / Sketch-Guided-Text-To-Image

Unofficial implementation of Sketch-Guided Text-to-Image Diffusion Models
8 stars 0 forks source link

GPU Specifications #3

Open rs2125 opened 5 months ago

rs2125 commented 5 months ago

How much GPU do we need to train the LEP with batch size = 1? I am not able to train the LEP with batch size =1 on a 11GB GPU

sangminkim-99 commented 5 months ago

Hi, @rs2125!

The current GPU memory consumption with xformers is approximately 18GB. To mitigate this issue, we can employ several strategies to reduce memory usage:

  1. Utilizing float16 instead of float32 for computations can significantly reduce memory overhead while maintaining acceptable precision.
  2. Another approach is to decrease the image resolution from 512x512 to 256x256. Through experimentation, I've found that this resolution fits within the confines of an 11GB GPU.
rs2125 commented 5 months ago

I reduced image resolution to 128x128. Yet I get the CUDA out of memory issue while training the LEP. It is just not able to load the model with given model_id. Any suggestions that could help me? It wants to download something, which it is not able to. I added print statements to check till where it is going and

        tbar = tqdm(dataloader)
        print("10************************************Reached Here***********************************")
        for _, (image, edge_map, caption) in enumerate(tbar):

It started to download something, printed 10 and again started to download and gave me the CUDA out of memory error.

sangminkim-99 commented 5 months ago

At this line, I changed the 512 to 256.

rs2125 commented 5 months ago

Thank you for your help. I trained the LEP on around 4500 images of imagenet, but the final results are pretty bad, they are as if LEP is not having any effect on the image generation.

Input Sketch: 3659166037_2b2c3bcb14 Prompt: A table and a few chairs in a room Output Image: sample1

Input Sketch: 6341578_51d0bcaca9 Prompt: A bedroom with a king sized bed and a mirror Output Image: sample

Any idea why isn't the LEP training improving the results? The results are as if simply generated by a Text to Image Diffusion Model and LEP isn't working.

And one more thing, what is the parameter input_dim=9324 in the LEP. I am asking this because it gave me an error when I tried to use pre-trained weights for LEP. The input_dim in pre-trained model was 7080

Moreover, what changes can we make to increase the batch-size for training the LEP? It gives me dimensionality error if I increase the batch-size

sangminkim-99 commented 5 months ago

Hi @rs2125, I found that some code is not up-to-date with my old desktop... I also cannot reproduce the old results. (You can check them in first issue of this repo)

I will try some modifications and test with your sketch + prompt.

And I am also aware of the batch size bug but I didnt fix it since current batch (=1) already takes 100% VRAM of my gpu. I'll do some debug sometime...

rs2125 commented 5 months ago

Hi @sangminkim-99. Thanks for you help and replies. I am not able to determine how do you find the input_dim = 9324 while initialising the LEP. I found a pre-trained model which had this parameter as 7080. Could you help me with the changes to be made to use these pre-trained weights, as the results are not at all good with the LEP that I am training with batch-size 1 on imagenet.

shaoke317 commented 1 month ago

Author, how to modify the dimension error generated when the batch size is greater than 1? Do you currently train LEP with a batch size greater than 1.