Closed julkaztwittera closed 7 months ago
Did you fine tune SDXL at 512 resolution or 1024 resolution? If you really want to use 1024 resolution for the spatial layers, you'll have to fine tune the temporal layers at 1024 resolution. Otherwise, will have to use an SDXL model fine tuned at/around the 512 resolution / hotshot-xl supported resolutions to get great outputs!
Well, actually I did both. So I finetuned SDXL at resolution 1024. Than I tried to finetune HotShotXL's temporal layers at 1024 resolution, but it gave me poor results...
I tried to run HotShotXL with my own finetuned SDXL as spatial 2D U-Net, but it gave me really poor results. It produced fine images but empty videos with almost no objects. Even for 8 frames video, the objects were really small and if I tried to run 16 frames, the video was empty with no objects. Hence I tried to finetune HotShotXL with these pretrained spatial weights but it didn't work though the loss was decreasing, still producing empty videos. Is there a way to fix this?