Inquiry on Memory Usage Issues with Diffusion Model Optimization

Hello, I wanted to express my gratitude for your work; it's been instrumental in my current project. However, I've encountered some confusion that I'm hoping you can shed light on.

I understand that in your approach, you freeze the weights of the diffusion model and only optimize the input random latent. While I appreciate the elegance of this method, I've noticed an issue with GPU memory usage that I believe may be related to the backward process.

Since the backward process cannot utilize with torch.no_grad(), the gradients of the latent variables are still being recorded and occupy GPU memory. As a result, the GPU memory usage increases with each step of the backward process. In my case, when optimizing a pre-trained 64-channel model on images of size (448, 168), the memory usage reaches up to 30GB.

I'm not certain if you've encountered this situation before. It's possible that because I'm manually migrating the code, there might be a step I've overlooked that's causing this issue.

hamadichihaoui / BIRD

Inquiry on Memory Usage Issues with Diffusion Model Optimization #2