openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Apache License 2.0
2.37k stars 802 forks source link

[GPU] execution order corrupted: _Map_base::at #1706

Closed jinz2014 closed 7 months ago

jinz2014 commented 8 months ago

Running the example from 254-llm-chatbot shows the following error. Can you reproduce the error ? Thanks for your instruction.

Selected model mpt-7b-chat
/path/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1a1d410c70591fcc1a46486a254cd0e600e7b1b4/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
  warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/path/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1a1d410c70591fcc1a46486a254cd0e600e7b1b4/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
  warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
Size of FP16 model is 12682.63 MB
Size of model with INT8 compressed weights is 6345.15 MB
Compression rate for INT8 model: 1.999
Size of model with INT4 compressed weights is 4005.57 MB
Compression rate for INT4 model: 3.166
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading model from mpt-7b-chat/INT4_compressed_weights
The argument `trust_remote_code` is to be used along with export=True. It will be ignored.
Compiling the model to GPU ...
Traceback (most recent call last):
  File "/path/LLM/openvino_notebooks/notebooks/254-llm-chatbot-2/main.py", line 293, in <module>
    answer = ov_model.generate(**input_tokens, max_new_tokens=512)
  File "/path/triton-xpu-build/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/path/triton-xpu-build/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1479, in generate
    return self.greedy_search(
  File "/path/triton-xpu-build/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2340, in greedy_search
    outputs = self(
  File "/path/triton-xpu-build/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/optimum/modeling_base.py", line 90, in __call__
    return self.forward(*args, **kwargs)
  File "/path/LLM/openvino_notebooks/notebooks/254-llm-chatbot-2/ov_llm_model.py", line 151, in forward
    self.request.wait()
RuntimeError: Exception from src/inference/src/infer_request.cpp:256:
Exception from src/bindings/python/src/pyopenvino/core/infer_request.hpp:54:
Caught exception: Check 'false' failed at src/plugins/intel_gpu/src/graph/primitive_inst.cpp:1146:
[GPU] execution order corrupted: _Map_base::at
Wan-Intel commented 8 months ago

I've validated the following command when running Create an LLM-powered Chatbot using OpenVINO with OpenVINO™ Notebooks 2023.3.

model_configuration = SUPPORTED_LLM_MODELS[model_id.value]
print(f"Selected model {model_id.value}")

Result: selected_model

Please re-install OpenVINO™ Notebooks 2023.3 with the steps from the Installation Guide and re-launch Create an LLM-powered Chatbot using OpenVINO.

jinz2014 commented 8 months ago

@Wan-Intel It is not about which model is selected. Can you reproduce the error during the inference phase ?

Wan-Intel commented 8 months ago

I'm able to run inference of Create an LLM-powered Chatbot using OpenVINO with OpenVINO™ Notebooks 2023.3.

ok_254

Please re-install OpenVINO™ Notebooks 2023.3 with the steps from the Installation Guide and re-launch Create an LLM-powered Chatbot using OpenVINO.