huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

https://huggingface.co/docs/optimum/main/en/intel/index

Apache License 2.0

355 stars 99 forks source link

[OV]: load and convert llms in original precision #778

Open eaidova opened 1 week ago

eaidova commented 1 week ago

What does this PR do?

allow loading bfloat16 and float16 models in original precision for conversion. It significantly reduces memory consumption and loading time during model conversion for large models

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eaidova commented 5 days ago

Looks great, thanks a lot @eaidova

@echarlaix thanks, we still investigating impact on models accuracy and quantization on our side. Could you please do not merge these changes, until we do not have whole picture?