Fail to start the training at high resolutions.

Little-Podi commented 8 months ago

Hi there, thank you for this great reproduction. I also reproduced the training code based on the codebase of Stability AI. When I start my training with high resolutions (e.g. 576x1024, 512x896), I found the model can only produce a sequence of blur like below:

However, if I run it at a relatively lower resolution (e.g. 320x576, 448x768) with all other settings fixed, the sampling results is as perfect as the public results. In fact, I found that the terrible results occur when the video width exceeds 800 pixels, despite the official recommendation to run at 576x1024. It's possible that I mistakenly touched some codes in my codebase, and I'm still working on this issue. I'm curious, have you ever attempted training at high resolutions? Have you encountered any similar problems?

pixeli99 commented 8 months ago

I've tried training at resolutions like 576x1024 and didn't encounter such a problem; it looks very strange, could you share your specific setting, and also, are the image_latents being fed into the unet normally?

Little-Podi commented 8 months ago

Thanks for your nice reply! I have checked the condition inputs but found no problems. The problem should be within the UNet blocks, but it does not contain any resolution-dependent operations (100 pixels in latent space is the activation threshold of my problem). That's weird. Anyway, I am going to close this issue and will update if I can find what's wrong. Thanks for your kind help again!

chenbinghui1 commented 8 months ago

@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.

xiangweifeng commented 4 months ago

@pixeli99 Hi, for 576x1024 resolution, I cannot train on 80G A100; the error is OUT OF CUDA memory; I follow the provided code with frames=25; Do you know what happens, and how much gpu memory costs for your case.

have you try deepspeed?

pixeli99 / SVD_Xtend

Fail to start the training at high resolutions. #29