Closed liyu-tan closed 1 year ago
What mode is that ? Some models have shapes incompatible with tp=4 (because their shapes are not divisible by 4). This doesnt' seem to be the case, the error would be clearer.
sigev 7 seems to be invalid access. Can you share the exact command you're lauching and all the details asked in the template ?
I am not getting those errors (because their shapes are not divisible by 4). I am using llama2-70b and llama2-13b.
text-generation-launcher --model-id meta-llama/Llama-2-70b-chat-hf --max-input-length 8000 --max-batch-prefill-tokens 8000 --max-total-tokens 8100 --port 8081 --trust-remote-code --num-shard 4
What card ?
Do you mind giving all the info propose in the template ?
Also I see
2023-08-11T06:42:31.317013Z WARN text_generation_launcher: Unable to use Flash Attention V2: Flash Attention V2 is not installed.
Unless you're running on V100 or older this you should install flashv2 (cd server/ && make install-flash-attention-v2
)
I am running in the A100, and I am not able to install the flash attention as well. Here is the error msg: /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/csrc/flash_attn/src/flash_fwd_launch_template.h: In lambda function: /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/csrc/flash_attn/src/flash_fwd_launch_template.h:183:1249: error: lambda capture of ‘Is_dropout’ is not a constant expression BOOL_SWITCH(params.p_dropout < 1.f, Is_dropout, [&] { ^ ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/usr/bin/devx_python/python_home/3.9.2/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/setup.py", line 201, in
I install the ninja before
Does anyone have this issue before?
Do you guys have c++17 ? Most of the recent kernels require it (the compile error seem to indicate it's the C++ part that is failing about the constant part)
g++ (Debian 8.3.0-6) 8.3.0
I also figure it there is some issues before.
home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
After resolving, I still have the issue to start a shard 4.
Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) or install flash attention v2 with `cd server && make install install-flash-attention-v2`
2023-08-23T23:41:22.314580Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2023-08-23T23:41:22.314580Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2023-08-23T23:41:22.314580Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-08-23T23:41:22.315185Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2023-08-23T23:41:27.118443Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=1
2023-08-23T23:41:27.118491Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1
2023-08-23T23:41:27.215455Z ERROR text_generation_launcher: Shard 1 failed to start
2023-08-23T23:41:27.215472Z INFO text_generation_launcher: Shutting down shards
2023-08-23T23:41:27.218010Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=2
2023-08-23T23:41:27.218027Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=2
2023-08-23T23:41:27.389135Z INFO shard-manager: text_generation_launcher: Shard terminated rank=3
2023-08-23T23:41:27.544059Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0```
the same issue, but i used cmd with A10 4 cards
text-generation-launcher --model-id Model/falcon-40b-instruct --sharded true --num-shard 4 --disable-custom-kernels --quantize bitsandbytes-nf4 --master-port 30000 --port 30001
the details ` 2023-08-24T03:32:01.146724Z INFO text_generation_launcher: Sharding model on 4 processes 2023-08-24T03:32:01.146791Z INFO download: text_generation_launcher: Starting download process. 2023-08-24T03:32:03.549527Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2023-08-24T03:32:03.949053Z INFO download: text_generation_launcher: Successfully downloaded weights. 2023-08-24T03:32:03.949203Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2023-08-24T03:32:03.949219Z INFO shard-manager: text_generation_launcher: Starting shard rank=1 2023-08-24T03:32:03.949253Z INFO shard-manager: text_generation_launcher: Starting shard rank=2 2023-08-24T03:32:03.949266Z INFO shard-manager: text_generation_launcher: Starting shard rank=3 2023-08-24T03:32:06.695108Z WARN text_generation_launcher: We're not using custom kernels.
2023-08-24T03:32:06.738100Z WARN text_generation_launcher: We're not using custom kernels.
2023-08-24T03:32:06.799434Z WARN text_generation_launcher: We're not using custom kernels.
2023-08-24T03:32:06.892935Z WARN text_generation_launcher: We're not using custom kernels.
2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2023-08-24T03:32:15.958826Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=0 2023-08-24T03:32:15.958858Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=0 2023-08-24T03:32:16.056799Z ERROR text_generation_launcher: Shard 0 failed to start 2023-08-24T03:32:16.056818Z INFO text_generation_launcher: Shutting down shards 2023-08-24T03:32:16.073862Z INFO shard-manager: text_generation_launcher: Shard terminated rank=2 2023-08-24T03:32:16.224323Z INFO shard-manager: text_generation_launcher: Shard terminated rank=1 2023-08-24T03:32:17.381476Z INFO shard-manager: text_generation_launcher: Shard terminated rank=3 Error: ShardCannotStart `
--shm-size 1g
is necessary for NCCL to see the cards.
how to set --shm-size in cmd with text-generation-launcher
Oh please use the docker environment. Here something is wrong in your environment. Signal 7 means invalid access. It seems bitsandbytes is causing it on older cards.
yes, official docker image worked. But i need to use centos to build the environment, and now it's not work. Did have any other guidelines?
No the README should cover most thing we also have a documentation now: https://huggingface.co/docs/text-generation-inference/index
@Narsil I'm facing similar issue while running using official docker.
How do I set--shm-size 1g
in case of docker? Please help with this case. Thanks.
Issue Description: ISSUE #1458
I am able to start the TGI without shard and with 2 shards. When I tried to start the TGI with 4 shards, and I got the error 2023-08-11T06:42:43.498333Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1
I had 4 A100 GPU, and cuda version is 12.0
Does anyone had this issue before?
Some logs: 2023-08-11T06:42:31.317013Z WARN text_generation_launcher: Unable to use Flash Attention V2: Flash Attention V2 is not installed. Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) or install flash attention v2 with
cd server && make install install-flash-attention-v2
2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-08-11T06:42:38.895305Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2023-08-11T06:42:43.498293Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /opt/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/functions.sh')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gitolite@code.uber.internal'), PosixPath('lm/fievel')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('-Dorg.joda.time.DateTimeZone.Folder=/etc/tzdata/jodatime/data')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/_ubuild_npm_ls.out')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/secrets/pip.conf')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/_ubuild_pip_freeze.out')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/var/cache/udocker/ma-endpoint-pos/tmp/spark')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=1 2023-08-11T06:42:43.498333Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1 2023-08-11T06:42:43.595541Z ERROR text_generation_launcher: Shard 1 failed to start 2023-08-11T06:42:43.595554Z INFO text_generation_launcher: Shutting down shards