Open freegheist opened 6 days ago
Deepseek v2 does not currently have an implementation in TGI, so it will revert to the upstream implementation which does not support sharding. We are currently working on native support for DeepSeek V2.
Deepseek v2 does not currently have an implementation in TGI, so it will revert to the upstream implementation which does not support sharding. We are currently working on native support for DeepSeek V2.
that would be amazing! thanks 🙏
System Info
ghcr.io/huggingface/text-generation-inference:2.0.4 & 2.1.0 Ubuntu 22.04 server, 8xA6000.
Information
Tasks
Reproduction
{ model_id: "deepseek-ai/DeepSeek-Coder-V2-Instruct", revision: None, validation_workers: 15, sharded: None, num_shard: Some( 8, ), quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: Some( 28672, ), max_total_tokens: Some( 32768, ), waiting_served_ratio: 0.3, max_batch_prefill_tokens: Some( 28672, ), max_batch_total_tokens: Some( 32768, ), max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: Some( [ 1, 2, 4, 8, 16, 32, ], ), hostname: "0.0.0.0", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some( "/data", ), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 0.99, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-generation-inference.router", cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4, lora_adapters: None, }
Expected behavior
2024-06-30T02:56:11.196310Z ERROR text_generation_launcher: Error when initializing model Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(*use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, self._args)