Sfast optimization for T2I , I2I and Upscale models

Implementation was straight forward from the already written code. But the testing took most of the time. Maybe its because my A5000 was slow or something, it would take forever to load and compile the model for sfast.

for T2I most of the models worked.

Best was ByteDance/SDXL_Lightning with total iteration time of 441s, 1st warmup was 386s and second was only 55s. Inference time of 5.10s/it for first image and it sped it up 4.65it/s for subsequent images.

Worst was SG161222/RealVisXL_V4.0_Lightning with total iteration time of unknows, 1st warmup was 1600s and second was was incomplete even after 55minutes. Inference time of unknown for first image and it sped it up unknown for subsequent images.

for I2I I couldnot get SDXL or SD_turbo to work. Neither did timbrooks/instruct-pix2pix.

for upscale the only model available was stabilityai/stable-diffusion-x4-upscaler but that didnt compile with sfast, even single iteration of compile would take forever. I left it for an hour but it only moved couple of steps. So I will need to make more tests to see if the issue is with models or hardware.

Conclusion: some models took insanely long to pre trace so I dont think that would be a good for anyone. Maybe if there was a way to precompile and save to memory so that you could instantly switch between precompiled models instead of compiling it each time you load the models.

livepeer / ai-worker

Sfast optimization for T2I , I2I and Upscale models #134