huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
176 stars 53 forks source link

Skip weight load during parallel compile #524

Closed michaelbenayoun closed 3 months ago

michaelbenayoun commented 3 months ago

What does this PR do?

Since the real weights are not needed during precompilation, we can skip weight initialization / loading.

HuggingFaceDocBuilderDev commented 3 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.