intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.49k stars 1.24k forks source link

inference LLama2-7b-half with fp8, there is a bug #9332

Open Fred-cell opened 10 months ago

Fred-cell commented 10 months ago

bigdl-core-xe 2.4.0b20231101 bigdl-core-xe-esimd 2.4.0b20231101 bigdl-llm 2.4.0b20231101

]# numactl -C 0-4 -m 0 python generate.py --repo-id-or-model-path ./pretrained-model/llama2-7b-half/ --n-predict 1024 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.22s/it] 2023-11-01 22:27:43,883 - bigdl.llm.transformers.utils - INFO - Converting the current model to fp8 format...... Traceback (most recent call last): File "/home/BigDL/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py", line 66, in File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1891, in _from_pretrained OSError: [Errno 24] Too many open files: './pretrained-model/llama2-7b-half/tokenizer_config.json'

hkvision commented 10 months ago

@cyita Take a look?

cyita commented 10 months ago

Hi Fred, I cannot reproduce this error, it seems this issue is related to the tokenizer

bigdl-core-xe             2.4.0b20231101           pypi_0    pypi
bigdl-core-xe-esimd       2.4.0b20231101           pypi_0    pypi
bigdl-llm                 2.4.0b20231101           pypi_0    pypi
source /opt/intel/oneapi/setvars.sh
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ENABLE_SDP_FUSION=1

numactl -C 0-4 -m 0 python llama_benchmark.py

image

Fred-cell commented 10 months ago

I have given you my environment, you can reproduced it once again.

cyita commented 10 months ago

You can raise the system open file limit using ulimit -n 2048

Fred-cell commented 10 months ago

when input prompt is 256, error is as below: <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 =========First token cost 6.1586 s========= =========Rest tokens cost average 0.0276 s (1023 tokens in all)========= =========First token cost 0.2373 s========= Traceback (most recent call last): File "/home/fred/LLM/text-generation/bigdl-llm/BigDL-bk/python/llm/example/gpu/hf-transformers-models/llama2/generate.py", line 92, in File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3525, in decode File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 204, in _convert_id_to_token File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/sentencepiece/init.py", line 1045, in _batched_func File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/sentencepiece/init.py", line 1038, in _func IndexError: piece id is out of range.

hkvision commented 10 months ago

Can you try unset this environment variable? SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS

hkvision commented 9 months ago

Any update on this issue? @Fred-cell