inference LLama2-7b-half with fp8, there is a bug

Fred-cell commented 1 year ago

bigdl-core-xe 2.4.0b20231101 bigdl-core-xe-esimd 2.4.0b20231101 bigdl-llm 2.4.0b20231101

]# numactl -C 0-4 -m 0 python generate.py --repo-id-or-model-path ./pretrained-model/llama2-7b-half/ --n-predict 1024 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.22s/it] 2023-11-01 22:27:43,883 - bigdl.llm.transformers.utils - INFO - Converting the current model to fp8 format...... Traceback (most recent call last): File "/home/BigDL/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py", line 66, in File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1891, in _from_pretrained OSError: [Errno 24] Too many open files: './pretrained-model/llama2-7b-half/tokenizer_config.json'

hkvision commented 1 year ago

@cyita Take a look?

cyita commented 1 year ago

Hi Fred, I cannot reproduce this error, it seems this issue is related to the tokenizer

bigdl-core-xe             2.4.0b20231101           pypi_0    pypi
bigdl-core-xe-esimd       2.4.0b20231101           pypi_0    pypi
bigdl-llm                 2.4.0b20231101           pypi_0    pypi

source /opt/intel/oneapi/setvars.sh
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ENABLE_SDP_FUSION=1

numactl -C 0-4 -m 0 python llama_benchmark.py

Fred-cell commented 1 year ago

I have given you my environment, you can reproduced it once again.

cyita commented 1 year ago

You can raise the system open file limit using ulimit -n 2048

Fred-cell commented 1 year ago

when input prompt is 256, error is as below: <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 =========First token cost 6.1586 s========= =========Rest tokens cost average 0.0276 s (1023 tokens in all)========= =========First token cost 0.2373 s========= Traceback (most recent call last): File "/home/fred/LLM/text-generation/bigdl-llm/BigDL-bk/python/llm/example/gpu/hf-transformers-models/llama2/generate.py", line 92, in File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3525, in decode File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 204, in _convert_id_to_token File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/sentencepiece/init.py", line 1045, in _batched_func File "/root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/sentencepiece/init.py", line 1038, in _func IndexError: piece id is out of range.

hkvision commented 1 year ago

Can you try unset this environment variable? SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS

hkvision commented 11 months ago

Any update on this issue? @Fred-cell

intel-analytics / ipex-llm

inference LLama2-7b-half with fp8, there is a bug #9332

bigdl-core-xe 2.4.0b20231101 bigdl-core-xe-esimd 2.4.0b20231101 bigdl-llm 2.4.0b20231101