huggingface / optimum-tpu

Google TPU optimizations for transformers models
Apache License 2.0
75 stars 19 forks source link

feat(cache): use optimized StaticCache class for XLA #70

Closed tengomucho closed 4 months ago

tengomucho commented 4 months ago

This is actually a ripoff of the work originally done as a contribution to transformers:

https://github.com/huggingface/transformers/pull/31129/

The original contribution has not been merged yet, but it shows lower memory usage and better performance on XLA. So I think it's worth adding it here.

HuggingFaceDocBuilderDev commented 4 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.