huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
355 stars 99 forks source link

[OV]: load and convert llms in original precision #778

Open eaidova opened 1 week ago

eaidova commented 1 week ago

What does this PR do?

allow loading bfloat16 and float16 models in original precision for conversion. It significantly reduces memory consumption and loading time during model conversion for large models

Fixes # (issue)

Before submitting

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eaidova commented 5 days ago

Looks great, thanks a lot @eaidova

@echarlaix thanks, we still investigating impact on models accuracy and quantization on our side. Could you please do not merge these changes, until we do not have whole picture?