RuntimeError: weight lm_head.weight does not exist when loading T5 model into TGI

anindya-saha commented 1 year ago

System Info

Hello Team, I am following the https://huggingface.co/docs/transformers/tasks/summarization tutorial for summarization. We do have TGI server and wanted to check if we can use TGI server to serve this model for summarization. When we try to load the stevhliu/my_awesome_billsum_model we get

tgi-text_generation_inference-1  | 2023-09-19T23:06:42.808534Z  INFO text_generation_launcher: Args { model_id: "stevhliu/my_awesome_billsum_model", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: true, max_concurrent_requests: 1, max_best_of: 2, max_stop_sequences: 20, max_input_length: 128, max_total_tokens: 512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: Some(2048), max_waiting_tokens: 20, hostname: "7499e2c9d3b5", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
tgi-text_generation_inference-1  | 2023-09-19T23:06:42.808570Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `stevhliu/my_awesome_billsum_model` do not contain malicious code.
tgi-text_generation_inference-1  | 2023-09-19T23:06:42.808647Z  INFO download: text_generation_launcher: Starting download process.
tgi-text_generation_inference-1  | 2023-09-19T23:06:44.587394Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
tgi-text_generation_inference-1  | 
tgi-text_generation_inference-1  | 2023-09-19T23:06:44.910979Z  INFO download: text_generation_launcher: Successfully downloaded weights.
tgi-text_generation_inference-1  | 2023-09-19T23:06:44.911196Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
tgi-inference_api-1              | INFO:botocore.credentials:Found credentials from IAM Role: DevEC2Role
tgi-inference_api-1              | I0919 23:06:46.473618889       1 ev_epoll1_linux.cc:121]               grpc epoll fd: 20
tgi-inference_api-1              | I0919 23:06:46.475148764       1 socket_utils_common_posix.cc:407]     Disabling AF_INET6 sockets because ::1 is not available.
tgi-inference_api-1              | I0919 23:06:46.475208015       1 socket_utils_common_posix.cc:336]     TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
tgi-inference_api-1              | I0919 23:06:46.475286496       1 tcp_server_posix.cc:339]              Failed to add :: listener, the environment may not support IPv6: UNKNOWN:Address family not supported by protocol {created_time:"2023-09-19T23:06:46.475177234+00:00", errno:97, os_error:"Address family not supported by protocol", syscall:"socket", target_address:"[::]:50051"}
tgi-text_generation_inference-1  | 2023-09-19T23:06:49.596335Z ERROR text_generation_launcher: Error when initializing model
tgi-text_generation_inference-1  | Traceback (most recent call last):
tgi-text_generation_inference-1  |   File "/opt/conda/bin/text-generation-server", line 8, in <module>
tgi-text_generation_inference-1  |     sys.exit(app())
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
tgi-text_generation_inference-1  |     return get_command(self)(*args, **kwargs)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
tgi-text_generation_inference-1  |     return self.main(*args, **kwargs)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
tgi-text_generation_inference-1  |     return _main(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
tgi-text_generation_inference-1  |     rv = self.invoke(ctx)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
tgi-text_generation_inference-1  |     return _process_result(sub_ctx.command.invoke(sub_ctx))
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
tgi-text_generation_inference-1  |     return ctx.invoke(self.callback, **ctx.params)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
tgi-text_generation_inference-1  |     return __callback(*args, **kwargs)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
tgi-text_generation_inference-1  |     return callback(**use_params)  # type: ignore
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
tgi-text_generation_inference-1  |     server.serve(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
tgi-text_generation_inference-1  |     asyncio.run(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
tgi-text_generation_inference-1  |     return loop.run_until_complete(main)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
tgi-text_generation_inference-1  |     self.run_forever()
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
tgi-text_generation_inference-1  |     self._run_once()
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
tgi-text_generation_inference-1  |     handle._run()
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
tgi-text_generation_inference-1  |     self._context.run(self._callback, *self._args)
tgi-text_generation_inference-1  | > File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
tgi-text_generation_inference-1  |     model = get_model(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 244, in get_model
tgi-text_generation_inference-1  |     return T5Sharded(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 70, in __init__
tgi-text_generation_inference-1  |     model = T5ForConditionalGeneration(config, weights)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1035, in __init__
tgi-text_generation_inference-1  |     self.lm_head = TensorParallelHead.load(
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 207, in load
tgi-text_generation_inference-1  |     weight = weights.get_tensor(f"{prefix}.weight")
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 65, in get_tensor
tgi-text_generation_inference-1  |     filename, tensor_name = self.get_filename(tensor_name)
tgi-text_generation_inference-1  |   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
tgi-text_generation_inference-1  |     raise RuntimeError(f"weight {tensor_name} does not exist")
tgi-text_generation_inference-1  | RuntimeError: weight lm_head.weight does not exist
tgi-text_generation_inference-1  | 
tgi-text_generation_inference-1  | 2023-09-19T23:06:50.216724Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Also, we are using the following docker-compose file to bring up the TGI, Could you please comment on that as well.

version: '3.5'
services:
  text_generation_inference:
    image: ghcr.io/huggingface/text-generation-inference:0.9.4
    command: >
      --model-id stevhliu/my_awesome_billsum_model
      --num-shard 1
      --max-input-length 128
      --max-total-tokens 512
      --max-batch-prefill-tokens 2048
      --max-batch-total-tokens 2048
      --max-concurrent-requests 1
      --max-stop-sequences 20
      --trust-remote-code
    shm_size: 1g
    env_file:
      - .env
    ports:
      - "8080:80"
    volumes:
      - ${VOLUME}:/data
      - ${CERTIFICATE_VOLUME_DIRECTORY}:/cert
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Following the official documentation for the TGI https://github.com/huggingface/text-generation-inference#docker causes the same error

model=t5-small
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $model

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 53, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight lm_head.weight does not exist

Expected behavior

What is the best way to serve the T5 model on TGI ?

Narsil commented 1 year ago

Try to use a more recent version of TGI , it contains many bugfixes along the lines of the one you describe.

anindya-saha commented 1 year ago

Hi @Narsil , Could you please reopen the issue as I do not have reopen permission. The issue is still on the latest version text-generation-inference:1.0.3 as mentioned in the Reproduction section. Could you please try that ?