huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.39k stars 954 forks source link

TGI seems is not supported that other position encoding, like baichuan2-7B(alibi position). #1786

Closed Night-Quiet closed 2 months ago

Night-Quiet commented 2 months ago

System Info

2024-04-22T02:18:37.204227Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 2d0a7173d4891e7cd5f9b77f8e0987b82a339e51
Docker label: N/A
nvidia-smi:
Mon Apr 22 10:18:37 2024       
   +---------------------------------------------------------------------------------------+
   | NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
   |-----------------------------------------+----------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
   |                                         |                      |               MIG M. |
   |=========================================+======================+======================|
   |   0  NVIDIA GeForce RTX 4090        On  | 00000000:1C:00.0 Off |                  Off |
   | 30%   29C    P8              30W / 450W |      2MiB / 24564MiB |      0%      Default |
   |                                         |                      |                  N/A |
   +-----------------------------------------+----------------------+----------------------+

   +---------------------------------------------------------------------------------------+
   | Processes:                                                                            |
   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
   |        ID   ID                                                             Usage      |
   |=======================================================================================|
   |  No running processes found                                                           |
   +---------------------------------------------------------------------------------------+

Information

Tasks

Reproduction

text-generation-launcher --model-id /root/autodl-tmp/baichuan --trust-remote-code

2024-04-22T02:11:00.927702Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-22T02:11:00.928097Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-22T02:11:03.592505Z ERROR text_generation_launcher: exllamav2_kernels not installed.

2024-04-22T02:11:03.623388Z  WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.utils.layers' (/root/text-generation-inference/server/text_generation_server/utils/layers.py)

2024-04-22T02:11:03.623918Z  WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'

2024-04-22T02:11:04.132360Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/root/miniconda3/envs/tgi/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
             ^^^^^

  File "/root/text-generation-inference/server/text_generation_server/cli.py", line 71, in serve
    from text_generation_server import server

  File "/root/text-generation-inference/server/text_generation_server/server.py", line 16, in <module>
    from text_generation_server.models.vlm_causal_lm import VlmCausalLMBatch

  File "/root/text-generation-inference/server/text_generation_server/models/vlm_causal_lm.py", line 14, in <module>
    from text_generation_server.models.flash_mistral import (

  File "/root/text-generation-inference/server/text_generation_server/models/flash_mistral.py", line 18, in <module>
    from text_generation_server.models.custom_modeling.flash_mistral_modeling import (

  File "/root/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 30, in <module>
    from text_generation_server.utils.layers import (

ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers' (/root/text-generation-inference/server/text_generation_server/utils/layers.py)
 rank=0
2024-04-22T02:11:04.230211Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-22T02:11:04.230247Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

Expected behavior

Please let me know if it supports baichuan-inc/Baichuan2-7B-Chat, or perhaps this error was caused by some of my operational errors. Thank you.

Night-Quiet commented 2 months ago

I referred to the following link to complete the repair: #1778