intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.54k stars 235 forks source link

Trying llama3.1 on Xeon and getting rope scaling error #681

Open js333031 opened 1 month ago

js333031 commented 1 month ago

Describe the issue

Installed conda-forge env based llama3.1 setup per following instructions.

Encountering the error below, which seems to be similar to one discussed here

(llama31) jays@llmtest:~/intel-extension-for-pytorch/examples/cpu/inference/python/llm$ python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency
LLM RUNTIME INFO: running model geneartion...
Namespace(model_id='meta-llama/Meta-Llama-3.1-8B-Instruct', dtype='bfloat16', input_tokens='32', max_new_tokens=32, prompt='why is the sky blue', streaming=False, image_url='http://images.cocodataset.org/val2017/000000039769.jpg', audio='example.flac', config_file=None, greedy=False, ipex=True, deployment_mode=True, torch_compile=False, backend='ipex', profile=False, benchmark=False, num_iter=100, num_warmup=10, batch_size=1, token_latency=True, cache_weight_for_large_batch=False)
[2024-07-24 19:06:53,654] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
Traceback (most recent call last):
  File "/home/jays/intel-extension-for-pytorch/examples/cpu/inference/python/llm/single_instance/run_generation.py", line 135, in <module>
    config = AutoConfig.from_pretrained(
  File "/home/jays/miniforge3/envs/llama31/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/home/jays/miniforge3/envs/llama31/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/home/jays/miniforge3/envs/llama31/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/home/jays/miniforge3/envs/llama31/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
LLM RUNTIME ERROR: Running generation task failed. Quit.
(llama31) jays@llmtest:~/intel-extension-for-pytorch/examples/cpu/inference/python/llm$
akashaero commented 1 month ago

Hey, I fixed the same error by upgrading my transformers library to the latest version (4.43.2)

python -m pip install --upgrade transformers

mandalrajiv commented 1 month ago

Hey, I fixed the same error by upgrading my transformers library to the latest version (4.43.2)

python -m pip install --upgrade transformers

I did upgrade transformers. Getting a message finished successfully, but not seeing any output.

Command run - python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency

Error: torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:198: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:199: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.tensor(rope_type).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:202: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.max_seq_len_cached = max_seq_len_cached.item() /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/models.py:305: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! and hidden_states.size(1) != 1 ipex.llm.optimize has set the optimized or quantization model for model.generate() LLM RUNTIME INFO: Finished successfully.

jgong5 commented 1 month ago

@jianan-gu

js333031 commented 1 month ago

I too tried the transformers lib update and got similar result as Rajiv above

jianan-gu commented 1 month ago

Hey, I fixed the same error by upgrading my transformers library to the latest version (4.43.2) python -m pip install --upgrade transformers

I did upgrade transformers. Getting a message finished successfully, but not seeing any output.

Command run - python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency

Error: torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:198: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:199: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.tensor(rope_type).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:202: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.max_seq_len_cached = max_seq_len_cached.item() /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/models.py:305: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! and hidden_states.size(1) != 1 ipex.llm.optimize has set the optimized or quantization model for model.generate() LLM RUNTIME INFO: Finished successfully.

@mandalrajiv Missing "--benchmark" in your command, as mentioned in doc https://intel.github.io/intel-extension-for-pytorch/llm/llama3_1/cpu/#id1

jingxu10 commented 1 month ago

@blzheng is working on the fix.

mandalrajiv commented 1 month ago

Hey, I fixed the same error by upgrading my transformers library to the latest version (4.43.2) python -m pip install --upgrade transformers

I did upgrade transformers. Getting a message finished successfully, but not seeing any output. Command run - python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency Error: torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:198: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). torch.tensor(seq_len).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:199: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.tensor(rope_type).contiguous(), /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/fusions/mha_fusion.py:202: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.max_seq_len_cached = max_seq_len_cached.item() /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/cpu/fusions/mha_fusion.py:234: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). seq_info = torch.tensor( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/transformers/models/reference/models.py:305: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! and hidden_states.size(1) != 1 ipex.llm.optimize has set the optimized or quantization model for model.generate() LLM RUNTIME INFO: Finished successfully.

@mandalrajiv Missing "--benchmark" in your command, as mentioned in doc https://intel.github.io/intel-extension-for-pytorch/llm/llama3_1/cpu/#id1

@jianan-gu - I was not using the benchmark option. I was running the program based on an input prompt.

mandalrajiv commented 1 month ago

Ran the run.py program with a prompt, as well as the benchmark. The benchmark ran fine. But running with the inout prompt had the same outcome as before. Shows “finished successfully”, but no output is generated.

Running with an input prompt

python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency

Benchmark

deepspeed --bind_cores_to_rank run.py --benchmark -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --greedy --input-tokens 1024 --autotp --shard-model

xiguiw commented 1 month ago

Ran the run.py program with a prompt, as well as the benchmark. The benchmark ran fine. But running with the inout prompt had the same outcome as before. Shows “finished successfully”, but no output is generated.

Running with an input prompt

python run.py -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --prompt "why is the sky blue" --token-latency

Benchmark

deepspeed --bind_cores_to_rank run.py --benchmark -m meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --ipex --greedy --input-tokens 1024 --autotp --shard-model

@mandalrajiv I did not get your point. What's the problem now? Could you show some instructions and details?

Thanks!