hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.71k stars 4.01k forks source link

XVERSE-13B-256K Model doesn't work properly in web_demo.py #2363

Closed seanxuu closed 8 months ago

seanxuu commented 8 months ago

Reminder

Reproduction

python src/web_demo.py \
    --model_name_or_path models/XVERSE-13B-256K \
    --template xverse
Exception in thread Thread-7 (generate):
Traceback (most recent call last):
  File " Miniconda/envs/llama_factory/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File " Miniconda/envs/llama_factory/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
    return self.sample(
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/utils.py", line 2861, in sample
    outputs = self(
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File " .cache/huggingface/modules/transformers_modules/XVERSE-13B-256K/modeling_xverse.py", line 715, in forward
    outputs = self.model(
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File " .cache/huggingface/modules/transformers_modules/XVERSE-13B-256K/modeling_xverse.py", line 603, in forward
    layer_outputs = decoder_layer(
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File " .cache/huggingface/modules/transformers_modules/XVERSE-13B-256K/modeling_xverse.py", line 311, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File " Miniconda/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File " .cache/huggingface/modules/transformers_modules/XVERSE-13B-256K/modeling_xverse.py", line 249, in forward
    assert not use_cache, "use_cache is not supported"
AssertionError: use_cache is not supported

Expected behavior

Bug report

System Info

Others

No response

seanxuu commented 8 months ago

[INFO|configuration_utils.py:802] 2024-01-29 15:32:49,297 >> Model config XverseConfig { "_name_or_path": "models/XVERSE-13B-256K", "architectures": [ "XverseForCausalLM" ], "auto_map": { "AutoConfig": "configuration_xverse.XverseConfig", "AutoModelForCausalLM": "modeling_xverse.XverseForCausalLM" }, "bos_token_id": 2, "eos_token_id": 3, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 32768, "max_tokenizer_truncation": 262144, "model_type": "xverse", "num_attention_heads": 40, "num_hidden_layers": 40, "pad_token_id": 1, "rms_norm_eps": 1e-06, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.36.2", "use_cache": false, "vocab_size": 100534 }

[INFO|modeling_utils.py:3341] 2024-01-29 15:32:49,425 >> loading weights filemodels/XVERSE-13B-256K/pytorch_model.bin.index.json [INFO|modeling_utils.py:1341] 2024-01-29 15:32:49,426 >> Instantiating XverseForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:826] 2024-01-29 15:32:49,427 >> Generate config GenerationConfig { "bos_token_id": 2, "eos_token_id": 3, "pad_token_id": 1, "use_cache": false }

seanxuu commented 8 months ago

I find a way to solve it: https://github.com/xverse-ai/XVERSE-13B/issues/27#issuecomment-1907907