All-in-one Meta-Llama-3.1-8B RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu!

Kpeacef commented 1 month ago

Hi I would like to try out Meta-Llama-3.1-8B with all-in-one benchmark.. seems to be I am facing this issue "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu!"

This is my pip list for your reference

Package Version

accelerate 0.23.0 aiohttp 3.9.5 aiosignal 1.3.1 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 attrs 23.2.0 bigdl-core-xe-21 2.5.0b20240726 bigdl-core-xe-addons-21 2.5.0b20240726 bigdl-core-xe-batch-21 2.5.0b20240726 certifi 2024.7.4 charset-normalizer 3.3.2 datasets 2.20.0 dill 0.3.8 docstring_parser 0.16 filelock 3.15.4 frozenlist 1.4.1 fsspec 2024.5.0 huggingface-hub 0.24.2 idna 3.7 intel-cmplr-lib-ur 2024.2.0 intel-extension-for-pytorch 2.1.10+xpu intel-openmp 2024.2.0 ipex-llm 2.1.0b20240726 Jinja2 3.1.4 markdown-it-py 3.0.0 MarkupSafe 2.1.5 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 numpy 1.26.4 omegaconf 2.3.0 packaging 24.1 pandas 2.2.2 pillow 10.4.0 pip 24.0 protobuf 5.28.0rc1 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pydantic 2.8.2 pydantic_core 2.20.1 Pygments 2.18.0 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.2rc1 regex 2024.7.24 requests 2.32.3 rich 13.7.1 safetensors 0.4.3 sentencepiece 0.2.0 setuptools 69.5.1 shtab 1.7.1 six 1.16.0 sympy 1.13.1 tabulate 0.9.0 tokenizers 0.19.1 torch 2.1.0a0+cxx11.abi torchvision 0.16.0a0+cxx11.abi tqdm 4.66.4 transformers 4.43.2 trl 0.9.6 typing_extensions 4.12.2 tyro 0.8.5 tzdata 2024.1 urllib3 2.2.2 wheel 0.43.0 xxhash 3.4.1 yarl 1.9.4

lei-sun-intel commented 1 month ago

I met exact the same problem when run all-in-one benchmark of Llama-3.1 8B.

lzivan commented 1 month ago

Hi, we are trying to reproduce your issue.

lzivan commented 1 month ago

Hi, we've already reproduced your error. Will get back to you once we find a solution.

lzivan commented 1 month ago

Hi @Kpeacef @lei-sun-intel ,

According to the device Runtime Error, we modified one line of code:

eos_token_mask = torch.isin(vocab_tensor, self.eos_token_id.to('xpu'))

at around #line 288 at

/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/transformers/generation/logits_process.py

So it would be:

    @add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        new_tokens_length = input_ids.shape[-1] - self.prompt_length_to_skip
        scores_processed = scores.clone()
        vocab_tensor = torch.arange(scores.shape[-1], device=scores.device)
        eos_token_mask = torch.isin(vocab_tensor, self.eos_token_id.to('xpu'))
        if new_tokens_length < self.min_new_tokens:
            scores_processed = torch.where(eos_token_mask, -math.inf, scores)

        return scores_processed

However, we still got a new error:

Traceback (most recent call last):
  File "/home/arda/zijie/llama3.1/all-in-one/run.py", line 2003, in <module>
    run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
  File "/home/arda/zijie/llama3.1/all-in-one/run.py", line 152, in run_model
    result = run_transformer_int4_fp16_gpu_win(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, cpu_embedding, batch_size, streaming)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arda/zijie/llama3.1/all-in-one/run.py", line 1126, in run_transformer_int4_fp16_gpu_win
    output_ids = model.generate(input_ids, do_sample=False,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/ipex_llm/utils/benchmark_util.py", line 1563, in generate
    return self.greedy_search(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/ipex_llm/utils/benchmark_util.py", line 2430, in greedy_search
    model_kwargs = self._update_model_kwargs_for_generation(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/ipex_llm/utils/benchmark_util.py", line 795, in _update_model_kwargs_for_generation
    return self.model._update_model_kwargs_for_generation(outputs, model_kwargs, is_encoder_decoder, standardize_cache_format)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arda/miniforge3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 699, in _update_model_kwargs_for_generation
    model_kwargs["cache_position"] = model_kwargs["cache_position"][-1:] + num_new_tokens
                                     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'cache_position'

It would probably be caused by the incompatibility between our benchmark_util.py and the new version of the transformers.

qiuxin2012 commented 1 month ago

@Kpeacef @lei-sun-intel We have support Llama-3.1 in all-in-one yesterday, you should update your ipex-llm and run.py to latest version.

Kpeacef commented 1 month ago

@qiuxin2012 , I have created another environment with 2.5.0b20240807.

Another issue of ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Issue is resolved by upgrading transformers, please update your transformers versions with "pip install --upgrade transfomers"

Tested transformers version 4.44.0

lzivan commented 1 month ago

Hi @Kpeacef , we've already reproduced this error before. We have tested and successfully ran it on transformers version "4.43.1".

intel-analytics / ipex-llm

All-in-one Meta-Llama-3.1-8B RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu! #11681