Open eaidova opened 1 week ago
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Looks great, thanks a lot @eaidova
@echarlaix thanks, we still investigating impact on models accuracy and quantization on our side. Could you please do not merge these changes, until we do not have whole picture?
What does this PR do?
allow loading bfloat16 and float16 models in original precision for conversion. It significantly reduces memory consumption and loading time during model conversion for large models
Fixes # (issue)
Before submitting