/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link
2024-08-11T22:11:55.740778Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-08-11T22:11:55.740792Z INFO text_generation_launcher: Sharding model on 5 processes
2024-08-11T22:11:55.740878Z INFO download: text_generation_launcher: Starting download process.
2024-08-11T22:11:58.162183Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-08-11T22:11:58.543915Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-08-11T22:11:58.544105Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-08-11T22:12:03.786121Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-08-11T22:12:03.786323Z INFO text_generation_launcher: CLI SHARDED = 5
2024-08-11T22:12:08.551807Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:18.560690Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:28.568664Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:38.578065Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:43.403761Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server exited with status = 1
2024-08-11T22:12:44.082499Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[WARNING|utils.py:212] 2024-08-11 22:12:02,456 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:03,010 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
[WARNING|utils.py:212] 2024-08-11 22:12:19,432 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,435 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,440 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,124 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,588 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,747 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:20.920 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:20.921 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
2024-08-11 22:12:21.398 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.399 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
2024-08-11 22:12:21.561 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.561 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
[WARNING|utils.py:225] 2024-08-11 22:12:21,580 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
[WARNING|utils.py:225] 2024-08-11 22:12:21,759 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:22.487 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.487 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
2024-08-11 22:12:22.611 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.612 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 0
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM : 1056375276 KB
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
Loading.load(child, self.state_dict, checking_key, self.mp_group)
KeyError File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
: 'model.layers.3.self_attn.q_proj.weight'
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s] rank=0
2024-08-11T22:12:44.083532Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-11T22:12:44.083544Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
System Info
root@laion-gaudi2-00:/home/sdp# docker run -p 8081:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$hf_token -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=2,3,4,5,6,7 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 5 --max-input-tokens 4096 --max-total-tokens 8192 --max-batch-prefill-tokens 8242 2024-08-11T22:11:55.546868Z INFO text_generation_launcher: Args { model_id: "hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4", revision: None, validation_workers: 2, sharded: Some( true, ), num_shard: Some( 5, ), quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: Some( 4096, ), max_input_length: None, max_total_tokens: Some( 8192, ), waiting_served_ratio: 0.3, max_batch_prefill_tokens: Some( 8242, ), max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "0d708a2172ae", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some( "/data", ), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4, } 2024-08-11T22:11:55.546942Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token" /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link
2024-08-11T22:11:55.740778Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32] 2024-08-11T22:11:55.740792Z INFO text_generation_launcher: Sharding model on 5 processes 2024-08-11T22:11:55.740878Z INFO download: text_generation_launcher: Starting download process. 2024-08-11T22:11:58.162183Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-08-11T22:11:58.543915Z INFO download: text_generation_launcher: Successfully downloaded weights. 2024-08-11T22:11:58.544105Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2024-08-11T22:12:03.786121Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-08-11T22:12:03.786323Z INFO text_generation_launcher: CLI SHARDED = 5
2024-08-11T22:12:03.786401Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
2024-08-11T22:12:08.551807Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-08-11T22:12:18.560690Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-08-11T22:12:28.568664Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-08-11T22:12:38.578065Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-08-11T22:12:43.403761Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server exited with status = 1
2024-08-11T22:12:44.082499Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[WARNING|utils.py:212] 2024-08-11 22:12:02,456 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:225] 2024-08-11 22:12:03,010 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( [WARNING|utils.py:212] 2024-08-11 22:12:19,432 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:212] 2024-08-11 22:12:19,435 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:212] 2024-08-11 22:12:19,440 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior! [WARNING|utils.py:225] 2024-08-11 22:12:20,124 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! [WARNING|utils.py:225] 2024-08-11 22:12:20,588 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! [WARNING|utils.py:225] 2024-08-11 22:12:20,747 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( 2024-08-11 22:12:20.920 | INFO | main:main:10 - TGIService: starting tgi service .... 2024-08-11 22:12:20.921 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( 2024-08-11 22:12:21.398 | INFO | main:main:10 - TGIService: starting tgi service .... 2024-08-11 22:12:21.399 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. 2024-08-11 22:12:21.561 | INFO | main:main:10 - TGIService: starting tgi service .... 2024-08-11 22:12:21.561 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server [WARNING|utils.py:225] 2024-08-11 22:12:21,580 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. [WARNING|utils.py:225] 2024-08-11 22:12:21,759 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior! /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( 2024-08-11 22:12:22.487 | INFO | main:main:10 - TGIService: starting tgi service .... 2024-08-11 22:12:22.487 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. 2024-08-11 22:12:22.611 | INFO | main:main:10 - TGIService: starting tgi service .... 2024-08-11 22:12:22.612 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. warnings.warn( ============================= HABANA PT BRIDGE CONFIGURATION =========================== PT_HPU_LAZY_MODE = 1 PT_RECIPE_CACHE_PATH = PT_CACHE_FOLDER_DELETE = 0 PT_HPU_RECIPE_CACHE_CONFIG = PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807 PT_HPU_LAZY_ACC_PAR_MODE = 0 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0 ---------------------------: System Configuration :--------------------------- Num CPU Cores : 160 CPU RAM : 1056375276 KBLoading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
Loading.load(child, self.state_dict, checking_key, self.mp_group)
KeyError File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
: 'model.layers.3.self_attn.q_proj.weight'
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replacedmodule, = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replacemodule
, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s] rank=0
2024-08-11T22:12:44.083532Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-11T22:12:44.083544Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
Information
Tasks
Reproduction
docker run -p 8081:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$hf_token -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=2,3,4,5,6,7 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 5 --max-input-tokens 4096 --max-total-tokens 8192 --max-batch-prefill-tokens 8242
Expected behavior
it should run.