Closed Little-Podi closed 8 months ago
I've tried training at resolutions like 576x1024 and didn't encounter such a problem; it looks very strange, could you share your specific setting, and also, are the image_latents
being fed into the unet normally?
Thanks for your nice reply! I have checked the condition inputs but found no problems. The problem should be within the UNet blocks, but it does not contain any resolution-dependent operations (100 pixels in latent space is the activation threshold of my problem). That's weird. Anyway, I am going to close this issue and will update if I can find what's wrong. Thanks for your kind help again!
@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.
@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.
have you try deepspeed?
Hi there, thank you for this great reproduction. I also reproduced the training code based on the codebase of Stability AI. When I start my training with high resolutions (e.g. 576x1024, 512x896), I found the model can only produce a sequence of blur like below:
However, if I run it at a relatively lower resolution (e.g. 320x576, 448x768) with all other settings fixed, the sampling results is as perfect as the public results. In fact, I found that the terrible results occur when the video width exceeds 800 pixels, despite the official recommendation to run at 576x1024. It's possible that I mistakenly touched some codes in my codebase, and I'm still working on this issue. I'm curious, have you ever attempted training at high resolutions? Have you encountered any similar problems?