huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
388 stars 110 forks source link

Describe `OVModelForCausalLM.from_pretrained()` args #738

Closed Wovchena closed 4 months ago

Wovchena commented 4 months ago

Please add docstrings describing OVModelForCausalLM.from_pretrained(). The things I wish I knew earlier:

  1. load_in_8bit is set to None instead of load_in_8bit=False enlisted in optimum\intel\openvino\modeling_decoder.py _from_pretrained() args. Additionally None has a different meaning compared to False.
  2. It's possible to skip loading a model with compile=False.
echarlaix commented 4 months ago

Hi @Wovchena, thanks for the feedback.

  1. When loading an OpenVINO model load_in_8bit is set to False by default, however when exporting a model to OpenVINO (setting export=True or using the CLI) then the model will be quantized for large models as described in the documentation, opening https://github.com/huggingface/optimum-intel/pull/745 to clarify this

  2. We have a section explaining this in the documentation let me know here or by opening a PR if you have any suggestion to improve it

Wovchena commented 4 months ago

Yeah, I wasn't careful enough while searching through the docs. This, by itself, may indicate a problem: I expected to find a page with a list of arguments similar to https://huggingface.co/docs/transformers/en/model_doc/auto, but the documentation is written in a storytelling style.

Additionally, jumping to a function definition in the IDE is usually safer because you get a signature and an argument description that correspond to the currently installed version. However, OVModelForCausalLM's _from_pretrained() method does not list the load_in_8bit or compile arguments. Docstrings would also be helpful.

It's also inconvenient that the function name starts with an underscore. Jumping to a function definition takes you to OptimizedModel instead of OVModelForCausalLM. OptimizedModel does not list the load_in_8bit or compile arguments either.

To sum it up, I miss docstrings and cleaner function definition :)

echarlaix commented 4 months ago

Yeah, I wasn't careful enough while searching through the docs. This, by itself, may indicate a problem: I expected to find a page with a list of arguments similar to https://huggingface.co/docs/transformers/en/model_doc/auto, but the documentation is written in a storytelling style.

We have something similar in the documentation but I agree that it could be extended / improved, feel free to open a PR if you have something in mind