Closed AIvashov closed 1 day ago
i found out that it is my mistake. in env -e CUDA_VISIBLE_DEVICES=0,1
. But now I have another issue
docker run --gpus all \
-e HF_HOME=/data \
-e CUDA_VISIBLE_DEVICES=0,1 \
-e NCCL_IGNORE_DISABLED_P2P=1 \
-e NCCL_P2P_DISABLE=1 \
-e TRITON_LIBCUDA_PATH=/usr/local/cuda-12.1/compat/ \
-v /storage/tf_cache/:/data \
-p 8000:8000 \
--rm \
--ipc=host \
--shm-size=100Gb \
--name tgi_test \
ghcr.io/huggingface/text-generation-inference:2.1.0 \
--model-id OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k --trust-remote-code --port 8000 --hostname 0.0.0.0 --num-shard 2
LOGS:
2024-06-28T11:27:02.183907Z INFO text_generation_launcher: Args {
model_id: "OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: Some(
2,
),
quantize: None,
speculate: None,
dtype: None,
trust_remote_code: true,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: None,
max_total_tokens: None,
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "0.0.0.0",
port: 8000,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some(
"/data",
),
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
}
2024-06-28T11:27:02.183972Z INFO hf_hub: Token file not found "/data/token"
2024-06-28T11:27:02.185611Z INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
2024-06-28T11:27:02.185618Z INFO text_generation_launcher: Default `max_input_tokens` to 4095
2024-06-28T11:27:02.185620Z INFO text_generation_launcher: Default `max_total_tokens` to 4096
2024-06-28T11:27:02.185622Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
2024-06-28T11:27:02.185624Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-06-28T11:27:02.185628Z WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k` do not contain malicious code.
2024-06-28T11:27:02.185630Z INFO text_generation_launcher: Sharding model on 2 processes
2024-06-28T11:27:02.185716Z INFO download: text_generation_launcher: Starting download process.
2024-06-28T11:27:05.000830Z INFO text_generation_launcher: Detected system cuda
2024-06-28T11:27:07.021516Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-06-28T11:27:07.698392Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-06-28T11:27:07.698638Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-06-28T11:27:07.698639Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-06-28T11:27:10.655958Z INFO text_generation_launcher: Detected system cuda
2024-06-28T11:27:10.749024Z INFO text_generation_launcher: Detected system cuda
2024-06-28T11:27:17.706926Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:27:17.707165Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:27:27.720205Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:27:27.740447Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:27:37.806208Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:27:37.812606Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:27:47.900818Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:27:47.904761Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:27:57.915449Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:27:57.916446Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:07.924304Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:07.939241Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:17.995423Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:18.012241Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:28.012532Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:28.102771Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:38.027759Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:38.117450Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:48.100639Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:48.207074Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:28:58.114913Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:28:58.220436Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:08.131400Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:29:08.239067Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:18.218077Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:29:18.319564Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:28.227565Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:29:28.329419Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:38.307420Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:29:38.342966Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:48.402267Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:29:48.412476Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:58.422970Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:29:58.423067Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:08.501083Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:08.525819Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:18.526233Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:18.619383Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:28.606955Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:28.711349Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:38.666041Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:38.771558Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:48.717671Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:48.883301Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:30:58.769347Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-06-28T11:30:58.887333Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-06-28T11:31:04.854994Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 766, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mixtral.py", line 22, in __init__
super(FlashMixtral, self).__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 97, in __init__
super().__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 822, in __init__
super(FlashCausalLM, self).__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/model.py", line 63, in __init__
self.target_to_layer = self.adapter_target_to_layer()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 156, in adapter_target_to_layer
if hasattr(layer.mlp, "gate_up_proj"):
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'MixtralLayer' object has no attribute 'mlp'
2024-06-28T11:31:04.857312Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 766, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mixtral.py", line 22, in __init__
super(FlashMixtral, self).__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 97, in __init__
super().__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 822, in __init__
super(FlashCausalLM, self).__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/model.py", line 63, in __init__
self.target_to_layer = self.adapter_target_to_layer()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 156, in adapter_target_to_layer
if hasattr(layer.mlp, "gate_up_proj"):
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'MixtralLayer' object has no attribute 'mlp'
2024-06-28T11:31:08.060487Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/bin/text-generation-server", line 8, in <module>
[rank0]: sys.exit(app())
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
[rank0]: server.serve(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
[rank0]: asyncio.run(
[rank0]: File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
[rank0]: return loop.run_until_complete(main)
[rank0]: File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[rank0]: return future.result()
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
[rank0]: model = get_model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 766, in get_model
[rank0]: return FlashMixtral(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mixtral.py", line 22, in __init__
[rank0]: super(FlashMixtral, self).__init__(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 97, in __init__
[rank0]: super().__init__(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 822, in __init__
[rank0]: super(FlashCausalLM, self).__init__(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/model.py", line 63, in __init__
[rank0]: self.target_to_layer = self.adapter_target_to_layer()
[rank0]: File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 156, in adapter_target_to_layer
[rank0]: if hasattr(layer.mlp, "gate_up_proj"):
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MixtralLayer' object has no attribute 'mlp'
rank=0
2024-06-28T11:31:08.079185Z ERROR text_generation_launcher: Shard 0 failed to start
2024-06-28T11:31:08.079210Z INFO text_generation_launcher: Shutting down shards
2024-06-28T11:31:08.159265Z INFO shard-manager: text_generation_launcher: Terminating shard rank=1
2024-06-28T11:31:08.159383Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=1
2024-06-28T11:31:08.860057Z INFO shard-manager: text_generation_launcher: shard terminated rank=1
Error: ShardCannotStart
https://github.com/huggingface/text-generation-inference/pull/2123 I will wait this MR
Hi just to confirm I get the same AttributeError: 'MixtralLayer' object has no attribute 'mlp'
error on alpindale/WizardLM-2-8x22B with the latest Docker.
I am also getting this error on Mixtral-8x22B-Instruct-v0.1
, however, it was working fine until 2.0.4 release.
I am also getting this error on Mixtral 8x7B, was fine on 2.0.4, fails on 2.1.0
After merge #2123 all right. You can test it on this release: https://github.com/huggingface/text-generation-inference/releases/tag/v2.1.1
System Info
128 gb RAM. On-premise machine with 2 GPUs.
Information
Tasks
Reproduction
Expected behavior
logs:
I understand that this error is due to the fact that this module cannot be imported: https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/__init__.py#L53-L92
WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.layers.layernorm' (/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/layernorm.py)