huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.48k stars 968 forks source link

Shard 2 failed to start #1644

Closed yasserkh2 closed 2 months ago

yasserkh2 commented 4 months ago

System Info

i was trying to run CohereForAI/c4ai-command-r-v01 with these commands model= CohereForAI/c4ai-command-r-v01 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --rm --gpus all --shm-size 1g -p 3000:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --num-shard 4 --model-id $model

and i got this error 2024-03-14T13:19:25.453070Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2024-03-14T13:19:25.453075Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2024-03-14T13:19:25.453075Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-03-14T13:19:25.463782Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2024-03-14T13:19:25.553183Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app())

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(

File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)

File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")

NotImplementedError: sharded is not supported for AutoModel rank=2 2024-03-14T13:19:25.553206Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app())

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(

File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)

File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")

NotImplementedError: sharded is not supported for AutoModel rank=1 2024-03-14T13:19:25.563889Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app())

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(

File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)

File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")

NotImplementedError: sharded is not supported for AutoModel rank=3 2024-03-14T13:19:25.651356Z ERROR text_generation_launcher: Shard 2 failed to start 2024-03-14T13:19:25.651368Z INFO text_generation_launcher: Shutting down shards 2024-03-14T13:19:25.658793Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0 Error: ShardCannotStart

Information

Tasks

Reproduction

model=FreedomIntelligence/AceGPT-13B-chat volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --rm --gpus all --shm-size 1g -p 3000:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --num-shard 4 --model-id $model

Expected behavior

expected to work normally.

giyaseddin commented 4 months ago

Any updates in this issue?

giyaseddin commented 4 months ago

@yasserkh2 Any ideas on the source of this issue?

yasserkh2 commented 3 months ago

no updates until now

giyaseddin commented 3 months ago

In my case, fixing the Nvidia driver for docker solved this issue.

suparious commented 3 months ago

Yes, the error sharded is not supported for AutoModel normally results from using older drivers, old cuda or old python versions. The requirements in the README appear to be accurate.

suparious commented 3 months ago

Also, when you use a supported model, then you wont need the "AutoModel" tokenizer, and you will have better performance. The c4ai is not supported yet, so this is the reason for TGI to decide to use AutoModel.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.