Closed maobenz closed 8 months ago
Could you please provide some more details, such as your specific settings, device information, and so on?
Thanks a lot!
I tried different resolution of bdd images but all step_loss is nan. I just use one video clip of the bdd and split the videos into some images to be fed into the model. I have tried the GTX3090 and A100.
When I use the fp32 model , the step loss is not nan but fp16'model' s loss is still nan. However, in the last block of upsample_block, query @ key.transpose(-1, -2) is too large to show nan.
My model id is "stabilityai/stable-video-diffusion-img2vid-xt", but when i tried other model is , it also doesn't work.
My torch version is 1.13.1+cu116, and my diffusers version is 0.25.0. Even if I input the all zeros as input, the loss is also nan.
OK, i have found the issue, the torch version should be 2.0.1 rather than 1.13.1. When I change the version of pytorch, the problem has been solved.
Ah, I see, but in fact, I might not be able to answer why modifications to the PyTorch version would cause this issue.😢
I upgraded the PyTorch to 2.1.2 but still has this problem, I can only train on the bf16. Any solutions?
I upgraded the PyTorch to 2.1.2 but still has this problem, I can only train on the bf16. Any solutions?
hi did you get any solutions? I get similar problem but the loss is nan
Hello, Thanks for your brilliant work! When I run the code, I find the step loss always equals nan when I use the bdd dataset. After carefully checking the code, I find the last block of the upsample_block' s output will be nan. I just use the fp16 model and follow the pipeline. Could anyone tell me what is the reason?
Thanks a lot!