livepeer / ai-worker

https://livepeer.ai
MIT License
16 stars 27 forks source link

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Open rickstaa opened 6 months ago

rickstaa commented 6 months ago

We're exploring various optimizations available in the Diffusers library to enhance VRAM usage and inference speed. @titan-node is currently benchmarking these optimizations, using his ai-benchmark wrapper, across his GPU pool and the Livepeer network to evaluate their effectiveness. Preliminary results are documented in this community spreadsheet via the ai-benchmarking wrapper.

Objective

The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.

Current Optimizations

The following optimizations are already integrated into our codebase:

Future Explorations

Links and Resources

rickstaa commented 6 months ago

@titan-node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Titan-Node commented 6 months ago

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

rickstaa commented 6 months ago

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

Thanks for the update. According to the docs the effects should be minimal.

rickstaa commented 5 months ago

Tracked internally at https://linear.app/livepeer-ai-spe/issue/LIV-321.