huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

176 stars 51 forks source link

Sync `transformers` and `accelerate` versions #562

Closed michaelbenayoun closed 1 month ago

michaelbenayoun commented 2 months ago

What does this PR do?

This PR synchronizes optimum-neuron with more recent transformers and accelerate versions:

accelerate==0.29.2, which is the latest release when this PR is being done,
transformers==4.40.2, which will be the latest releae when this PR is merged.

Related PR in transformers: https://github.com/huggingface/transformers/pull/30259

On top of that:

The workflows for Trainium instances have been updated and use K8 now.

HuggingFaceDocBuilderDev commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dacorvo commented 1 month ago

You should unpin the safetensor package here because there is now a conflict: https://github.com/huggingface/optimum-neuron/blob/1e7d0f5ae47fd51b2418b1355a2e819e58b69890/text-generation-inference/server/pyproject.toml#L17

michaelbenayoun commented 1 month ago

I fixed all but one test:

tests/generation/test_tnx_llama.py::test_decoder_generation_multiple_eos_token_ids