Closed Wovchena closed 4 months ago
Hi @Wovchena, thanks for the feedback.
When loading an OpenVINO model load_in_8bit
is set to False by default, however when exporting a model to OpenVINO (setting export=True
or using the CLI) then the model will be quantized for large models as described in the documentation, opening https://github.com/huggingface/optimum-intel/pull/745 to clarify this
We have a section explaining this in the documentation let me know here or by opening a PR if you have any suggestion to improve it
Yeah, I wasn't careful enough while searching through the docs. This, by itself, may indicate a problem: I expected to find a page with a list of arguments similar to https://huggingface.co/docs/transformers/en/model_doc/auto, but the documentation is written in a storytelling style.
Additionally, jumping to a function definition in the IDE is usually safer because you get a signature and an argument description that correspond to the currently installed version. However, OVModelForCausalLM
's _from_pretrained()
method does not list the load_in_8bit
or compile
arguments. Docstrings would also be helpful.
It's also inconvenient that the function name starts with an underscore. Jumping to a function definition takes you to OptimizedModel
instead of OVModelForCausalLM
. OptimizedModel
does not list the load_in_8bit
or compile
arguments either.
To sum it up, I miss docstrings and cleaner function definition :)
Yeah, I wasn't careful enough while searching through the docs. This, by itself, may indicate a problem: I expected to find a page with a list of arguments similar to https://huggingface.co/docs/transformers/en/model_doc/auto, but the documentation is written in a storytelling style.
We have something similar in the documentation but I agree that it could be extended / improved, feel free to open a PR if you have something in mind
Please add docstrings describing
OVModelForCausalLM.from_pretrained()
. The things I wish I knew earlier:load_in_8bit
is set toNone
instead ofload_in_8bit=False
enlisted inoptimum\intel\openvino\modeling_decoder.py
_from_pretrained() args
. AdditionallyNone
has a different meaning compared toFalse
.compile=False
.