huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
196 stars 59 forks source link

Mixed-precision training with both `torch_xla` or `torch.autocast` #523

Closed michaelbenayoun closed 6 months ago

michaelbenayoun commented 6 months ago

What does this PR do?

There are two ways to cast to bfloat16:

The first approach was already supported, this PR adds support for the second approach. It also fixes issues related to how we can set the NEURON_CC_FLAGS. If they are set too late (e.g after the process group initialization), they will be ignored by the compiler. This PR makes sure we set them at the right time.

HuggingFaceDocBuilderDev commented 6 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.