Unsupported input type, fallback to the origin model

akarX23 commented 7 months ago

Describe the issue

I am trying to run the "meta-llama/llama-2-7b-chat-hf" with the llm-on-ray framework, however I am getting the following output:

(ServeReplica:router:PredictorDeployment pid=3040030) /home/develop/.anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/intel_extension_for_pytorch/transformers/models/reference/modules/attentions.py:962: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
(ServeReplica:router:PredictorDeployment pid=3040030)   + torch.tensor(combined_attention_mask)
(ServeReplica:router:PredictorDeployment pid=3040030) /home/develop/.anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/intel_extension_for_pytorch/transformers/optimize.py:683: UserWarning: fail to apply optimize_transformers due to: Unsupported input type, fallback to the origin model

I think llama-2-7b-chat-hf should be supported for optimizations. These are the versions I am using:

transformers 4.31
torch 2.1.0
ipex 2.1.0

Kindly assist in solving this issue. Thank you!

Vasud-ha commented 7 months ago

Hi @akarX23, I will reproduce the issue and get back to you.

akarX23 commented 7 months ago

Sure, thank you for your time and support @Vasud-ha

Vasud-ha commented 7 months ago

Hi @akarX23, could you update the version for transformers>=4.35.0. Refer from https://github.com/intel/llm-on-ray/tree/main

akarX23 commented 7 months ago

Hi @Vasud-ha , the reason I am using transformers 4.31 is because bigdl-all requires transformers version 4.31, this is the output of pip install ".[cpu,bigdl-cpu]":

134.0 The conflict is caused by:
134.0     llm-on-ray[bigdl-cpu,cpu] 0.0.1 depends on transformers>=4.35.0; extra == "cpu"
134.0     bigdl-llm[all] 2.5.0b20240222 depends on transformers==4.31.0; extra == "all"

I would have to use either ipex or bigdl one at a time

akarX23 commented 7 months ago

Can you suggest a version of bigdl-llm compatible with transformers 4.35 or 4.37?

Vasud-ha commented 7 months ago

Could you try installing bigdl-llm with this command pip install --pre --upgrade bigdl-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu with transformer 4.35 or newer?

xwu99 commented 7 months ago

@akarX23 need to update to torch/ipex to 2.2 and transformers to 4.35.2, check this: https://github.com/intel/llm-on-ray/pull/143

akarX23 commented 7 months ago

@Vasud-ha @xwu99, I will try these suggestions soon. Currently, I have shifted to OVMS with ITREX, the performance is pretty good. I will update you with what I find.

ZailiWang commented 1 month ago

Hi @akarX23 are you still working on llm-on-ray, and does the issue persist? Thanks.

akarX23 commented 1 month ago

Hi @ZailiWang , currently the work with llm-on-ray has stopped. I have not checked in if the issue persists. We have customers who will be trying out llm-on-ray soon, if anything comes up I will raise the issue. Thank you for the help!

intel / intel-extension-for-pytorch

Unsupported input type, fallback to the origin model #534

Describe the issue