huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

196 stars 59 forks source link

[Inference] Neuron cache for traced torchscript models (encoders, stable diffusion) #510

Closed JingyaHuang closed 6 months ago

JingyaHuang commented 6 months ago

What does this PR do?

Fixes #284

[x] Cache compiled artifacts (with preprocessor, config, etc.) during export.
[x] Synchronize with hub remote cache.
[x] Look up cached models through optimum-cli.
[x] Check cache existence while export=True while using the modeling API.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

TEST

Set remote repo

optimum-cli neuron cache set Jingya/optimum-neuronx-cache

Synchronize

(enable setting custom cache repo for synchronize)

optimum-cli neuron cache synchronize (--repo_id Jingya/optimum-neuronx-cache)

Lookup

 optimum-cli neuron cache lookup google-bert/bert-base-cased --mode inference

(eg. google-bert/bert-base-uncased can reuse the cached artifacts from google-bert/bert-base-cased)

Supported

Encoders, stable diffusion

Next steps

T5 (will come in the next PR)
Doc for cache system
Auto-fill cache tools

HuggingFaceDocBuilderDev commented 6 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.