How to scale the model parameters to fit into reasonable GPUs

xichenpan / ARLDM

Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

https://arxiv.org/abs/2211.10950

MIT License

191 stars 29 forks source link

How to scale the model parameters to fit into reasonable GPUs #23

Open maliozer opened 1 year ago

maliozer commented 1 year ago

How to reduce the parameters size to fit model into gpus, I have already tried 16bit precision but also need to scale the model

styufo commented 1 year ago

Unfortunately, this seems impossible. I'm trying freezing resnet, clip embedding, blip embedding and 8-bit optimizer together，but V100 32G still doesn't work. The only successful case I saw was freezing resnet, clip embedding, blip embedding, using amp and 8-bit optimizer together helped reduce the vRAM to about 40GB onA6000 48G.

maliozer commented 1 year ago

I also tried the same process, but I thought the other parameters should be scaleable as well in somehow without messing the model.

styufo commented 1 year ago

I also tried the same process, but I thought the other parameters should be scaleable as well in somehow without messing the model.

If you have any progress, I would be happy if you could tell me about your successful parameter configuration.

TimandXiyu commented 1 year ago

With thw default setting I don't think it is possible to train at 512*512 using 40G A100s. Still, it is a bit strange the authors don't freeze the CLIP, BLIP net.

Anyways, with freezing CLIP, BLIP, and resnet, you still go tons of parameters of the cross-attention you can play with, and this might be enough already. (still waiting to check my ckpt)

debasmitdas commented 1 year ago

@TimandXiyu. Are your checkpoints ready ? If possible, will you be ready to share them.

DwanAI commented 1 year ago

@TimandXiyu. Are your checkpoints ready ? If possible, will you be ready to share them.

kirbu123 commented 1 year ago

@TimandXiyu. Are your checkpoints ready ? If possible, will you be ready to share them. Can you explain, how to learn ARLDM with one available CUDA index. How to beat CUDA out of memory error using CLIP, BLIP, RESNET freezing or other methodics.