Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.28k
stars
1.23k
forks
source link
inference error: mistral and codellama have issue 'object has no attribute '_has_non_default_generation_parameters' #11415
running following example,
inference-ipex-llmfor mistral and codellama (working for llama2)
My guessed rank = 1
My guessed rank = 0
2024-06-24 11:32:19,965 - INFO - intel_extension_for_pytorch auto imported
2024-06-24 11:32:19,965 - INFO - intel_extension_for_pytorch auto imported
/xxxx/xxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/xxx/xxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.12it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.10it/s]
2024-06-24 11:32:22,839 - INFO - Converting the current model to sym_int4 format......
2024-06-24 11:32:22,907 - INFO - Converting the current model to sym_int4 format......
2024:06:24-11:32:29:(120347) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2024:06:24-11:32:29:(120347) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2024:06:24-11:32:29:(120347) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
2024:06:24-11:32:29:(120346) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2024:06:24-11:32:29:(120346) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2024:06:24-11:32:29:(120346) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
Traceback (most recent call last):
File "/home/rajritu/ritu/ipex-llm/python/llm/example/GPU/Pipeline-Parallel-Inference/generate.py", line 68, in <module>
output = model.generate(input_ids,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/xxx/xxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/xxx/xxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/xxx/xxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/xxxx/xxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/xxxx/xxxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/xxxx/xxxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 163, in generate
and self.config._has_non_default_generation_parameters()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/xxxx/xxxxx/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
return super().__getattribute__(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'MistralConfig' object has no attribute '_has_non_default_generation_parameters'
Traceback (most recent call last):
File "/xxxx/ipex-llm/python/llm/example/GPU/Pipeline-Parallel-Inference/generate.py", line 68, in <module>
output = model.generate(input_ids,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rajritu/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 163, in generate
and self.config._has_non_default_generation_parameters()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
return super().__getattribute__(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'MistralConfig' object has no attribute '_has_non_default_generation_parameters'
[2024-06-24 11:32:32,587] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 120346) of binary: /home/rajritu/miniforge3/envs/envIPEX_LLM_INF/bin/python3.11
Traceback (most recent call last):
File "/miniforge3/envs/envIPEX_LLM_INF/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniforge3/envs/envIPEX_LLM_INF/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
generate.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-06-24_11:32:32
host : imu-nex-sprx3-ws
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 120347)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-06-24_11:32:32
host : imu-nex-sprx3-ws
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 120346)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
GPU: 2 ARC CARD
running following example, inference-ipex-llm for mistral and codellama (working for llama2)
same for codellama: