hotshotco / Hotshot-XL

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
https://hotshot.co
Apache License 2.0
982 stars 77 forks source link

Poor results on own pretrained SDXL #30

Closed julkaztwittera closed 7 months ago

julkaztwittera commented 7 months ago

I tried to run HotShotXL with my own finetuned SDXL as spatial 2D U-Net, but it gave me really poor results. It produced fine images but empty videos with almost no objects. Even for 8 frames video, the objects were really small and if I tried to run 16 frames, the video was empty with no objects. Hence I tried to finetune HotShotXL with these pretrained spatial weights but it didn't work though the loss was decreasing, still producing empty videos. Is there a way to fix this?

aakashs commented 7 months ago

Did you fine tune SDXL at 512 resolution or 1024 resolution? If you really want to use 1024 resolution for the spatial layers, you'll have to fine tune the temporal layers at 1024 resolution. Otherwise, will have to use an SDXL model fine tuned at/around the 512 resolution / hotshot-xl supported resolutions to get great outputs!

julkaztwittera commented 7 months ago

Well, actually I did both. So I finetuned SDXL at resolution 1024. Than I tried to finetune HotShotXL's temporal layers at 1024 resolution, but it gave me poor results...