Closed yasserkh2 closed 2 months ago
Any updates in this issue?
@yasserkh2 Any ideas on the source of this issue?
no updates until now
In my case, fixing the Nvidia driver for docker solved this issue.
Yes, the error sharded is not supported for AutoModel
normally results from using older drivers, old cuda or old python versions. The requirements in the README appear to be accurate.
Also, when you use a supported model, then you wont need the "AutoModel" tokenizer, and you will have better performance. The c4ai
is not supported yet, so this is the reason for TGI to decide to use AutoModel.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
i was trying to run CohereForAI/c4ai-command-r-v01 with these commands model= CohereForAI/c4ai-command-r-v01 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --rm --gpus all --shm-size 1g -p 3000:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --num-shard 4 --model-id $model
and i got this error 2024-03-14T13:19:25.453070Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2024-03-14T13:19:25.453075Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2024-03-14T13:19:25.453075Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-03-14T13:19:25.463782Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2024-03-14T13:19:25.553183Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")
NotImplementedError: sharded is not supported for AutoModel rank=2 2024-03-14T13:19:25.553206Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")
NotImplementedError: sharded is not supported for AutoModel rank=1 2024-03-14T13:19:25.563889Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 428, in get_model raise NotImplementedError("sharded is not supported for AutoModel")
NotImplementedError: sharded is not supported for AutoModel rank=3 2024-03-14T13:19:25.651356Z ERROR text_generation_launcher: Shard 2 failed to start 2024-03-14T13:19:25.651368Z INFO text_generation_launcher: Shutting down shards 2024-03-14T13:19:25.658793Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0 Error: ShardCannotStart
Information
Tasks
Reproduction
model=FreedomIntelligence/AceGPT-13B-chat volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --rm --gpus all --shm-size 1g -p 3000:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --num-shard 4 --model-id $model
Expected behavior
expected to work normally.