huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
413 stars 112 forks source link

add patching for update_causal_mask to falcon for >= 4.45 #989

Closed eaidova closed 2 weeks ago

eaidova commented 3 weeks ago

current approach brings accuracy issues on platforms that natively supports bf16/fp16 (e.g. iGPU, ARM CPU) if model loaded and converted as fp32 (due to numeric overflow representing torch_dtype.min for fp32 as bf16 or fp16 constant), reused the same patching like in llama and gemma for other models

HuggingFaceDocBuilderDev commented 3 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 commented 2 weeks ago

@eaidova, please take a look at the failed tests.

eaidova commented 2 weeks ago

@eaidova, please take a look at the failed tests.

@AlexKoff88 it does not seems to be related to my changes:

FAILED tests/openvino/test_modeling.py::OVModelIntegrationTest::test_load_model_from_hub_private_with_token - huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-672b4212-1f28ca04615798037e8d4cf0;d3d5573b-1008-445b-ac83-e1c2e85cb525)

looks like ci used account does not have access to download repo using private token in this test, I sow this issue previously time to time

AlexKoff88 commented 2 weeks ago

@slyalin, @nikita-savelyevv, please help with the review.