Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations

rickstaa commented 6 months ago

We're exploring various optimizations available in the Diffusers library to enhance VRAM usage and inference speed. @titan-node is currently benchmarking these optimizations, using his ai-benchmark wrapper, across his GPU pool and the Livepeer network to evaluate their effectiveness. Preliminary results are documented in this community spreadsheet via the ai-benchmarking wrapper.

Objective

The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.

Current Optimizations

The following optimizations are already integrated into our codebase:

Half Precision: Utilizing half-precision weights was supported to enhance inference speed and reduce memory consumption, implemented in ai-worker/image_to_video pipeline.
SFAST (xformers & Triton): Adopted from stable-fast, currently speeds up inference and may reduce memory usage in the future. See implementation in ai-worker/sfast pipeline.

Future Explorations

CPU Offloading: @titan-node is currently investigating the potential to decrease memory usage by (sequential) CPU offloading certain computations to the CPU, as described in CPU offloading optimization.
- 34
- 35
- 36

Links and Resources

rickstaa commented 6 months ago

@titan-node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Titan-Node commented 6 months ago

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

rickstaa commented 6 months ago

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

Thanks for the update. According to the docs the effects should be minimal.

rickstaa commented 5 months ago

Tracked internally at https://linear.app/livepeer-ai-spe/issue/LIV-321.

livepeer / ai-worker