TGI crash during Warming up model - invalid opcode in rotary_emb.cpython-310-x86_64-linux-gnu.so

zidsi commented 5 months ago

System Info

TGI fails to start. Version used ghcr.io/huggingface/text-generation-inference:2.0.3

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A2                      Off | 00000000:04:00.0 Off |                    0 |
|  0%   49C    P0              20W /  60W |   4360MiB / 15356MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Logs (debug) indicate Server error: transport error Error: Warmup(Generation("transport error")) at router/client, with hyper-0.14.28/src/proto/h2/client.rs:326: client response error: stream closed because of a broken pipe.

2024-05-21T08:33:59.489369Z  INFO text_generation_router: router/src/main.rs:317: Warming up model
2024-05-21T08:33:59.489459Z DEBUG warmup{max_input_length=2047 max_prefill_tokens=2047 max_total_tokens=2048 max_batch_size=Some(1)}:warmup: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-05-21T08:33:59.489569Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) }
2024-05-21T08:33:59.489619Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5) }
2024-05-21T08:33:59.489680Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5), flags: (0x1: END_STREAM) }
2024-05-21T08:33:59.588618Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [91, 55, 230, 85, 189, 39, 208, 231] }
2024-05-21T08:33:59.588652Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [91, 55, 230, 85, 189, 39, 208, 231] }
2024-05-21T08:34:00.533910Z DEBUG hyper::proto::h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/proto/h2/client.rs:326: client response error: stream closed because of a broken pipe
2024-05-21T08:34:00.533881Z DEBUG hyper::client::service: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.28/src/client/service.rs:79: connection error: hyper::Error(Io, Custom { kind: BrokenPipe, error: "connection closed because of a broken pipe" })
2024-05-21T08:34:00.534063Z ERROR warmup{max_input_length=2047 max_prefill_tokens=2047 max_total_tokens=2048 max_batch_size=Some(1)}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
Error: Warmup(Generation("transport error"))
2024-05-21T08:34:00.562507Z ERROR text_generation_launcher: Webserver Crashed
2024-05-21T08:34:00.562560Z  INFO text_generation_launcher: Shutting down shards

DMESG indicates invalid opcode in rotary_emb.cpython-310-x86_64-linux-gnu.so.

[ 8415.199074] traps: pt_main_thread[15027] trap invalid opcode ip:7f9db6affba3 sp:7ffc1454c220 error:0 in rotary_emb.cpython-310-x86_64-linux-gnu.so[7f9db6aed000+20000]

So maybe processor info would help as well - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz.

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Can reproduce with:

 docker run --gpus all --shm-size 64g -p 8080:80 ghcr.io/huggingface/text-generation-inference:2.0.3 --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0

Expected behavior

TGI should start or report unsupported HW?

zidsi commented 5 months ago

FYI: Building docker image from git clone on target machine and starting docker works OK.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference