torch CUDA graphs with HF generate

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

134.34k stars 26.86k forks source link

torch CUDA graphs with HF generate #27837

Open tsengalb99 opened 11 months ago

tsengalb99 commented 11 months ago

Feature request

In my experiments, I cannot get torch CUDA graphs to work with HF generate. CUDA graphs work fine when calling the forward pass of a model, but either due to static input/output sizes or something else, stream capture fails when calling .generate(). Can support for torch CUDA graphs be added?

Motivation

LLMs have a lot of kernel launches and CUDA graphs can remove most of the launch time. In my experiments with just forward call, CUDA graphs can be twice as fast as non-CUDA graph versions of the same model.

Your contribution

n/a

ArthurZucker commented 11 months ago

This is kind of planned as we want to support static caching to compile the models and have faster inference 😉 cc @gante might have already been asked in other issues as well

gante commented 11 months ago

@tsengalb99 as Arthur wrote, we are working on it :D Expect to see updates soon

tsengalb99 commented 10 months ago

Are there any updates on this? And what is the main reason why cuda graphs don't work right now?

ArthurZucker commented 10 months ago

Follow this PR #27931 for update, the dynamic KV cache is an issue

ArthurZucker commented 9 months ago

PR is still very much active and now supports cuda graphs

tsengalb99 commented 9 months ago

Great, looking forward to seeing it merged! Do you have an ETA on when that will happen?

From: Arthur @.> Sent: Tuesday, January 30, 2024 12:46 AM To: huggingface/transformers @.> Cc: Albert Tseng @.>; Mention @.> Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837)

PR is still very much active and now supports cuda graphs

— Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/27837#issuecomment-1916340798 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AH6WZSDGXROQGEU3ISVVA7DYRCXOFAVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGM2DANZZHA . You are receiving this because you were mentioned.Message ID: @.***>

ArthurZucker commented 9 months ago

Only needs a final review so this week 😉

tsengalb99 commented 8 months ago

Hi Arthur,

I saw the PR got merged in - what is the recommended way to use cuda graphs during generation? I am wrapping the entire model with a torch cuda graph wrapper right now and am getting the same graph breaking errors as before.

Thanks, Albert

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: Arthur @.> Sent: Sunday, February 4, 2024 9:24:13 PM To: huggingface/transformers @.> Cc: Albert Tseng @.>; Mention @.> Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837)

Only needs a final review so this week 😉

— Reply to this email directly, view it on GitHubhttps://github.com/huggingface/transformers/issues/27837#issuecomment-1926111220, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6WZSDUHK7DTUOHUS3KK4LYSA7E3AVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWGEYTCMRSGA. You are receiving this because you were mentioned.Message ID: @.***>

ArthurZucker commented 8 months ago

Hey! Here is how I used it: https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb. I used compiled, so not 100 sure how the explicit call will work! Feel free to reach out if it does not work!

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker commented 8 months ago

A PR is coming for this! #29374