huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.9k stars 1.05k forks source link

Cannot run the TGI with 4 shard #814

Closed liyu-tan closed 1 year ago

liyu-tan commented 1 year ago

I am able to start the TGI without shard and with 2 shards. When I tried to start the TGI with 4 shards, and I got the error 2023-08-11T06:42:43.498333Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1

I had 4 A100 GPU, and cuda version is 12.0

Does anyone had this issue before?

Some logs: 2023-08-11T06:42:31.317013Z WARN text_generation_launcher: Unable to use Flash Attention V2: Flash Attention V2 is not installed. Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) or install flash attention v2 with cd server && make install install-flash-attention-v2

2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-08-11T06:42:38.895305Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2023-08-11T06:42:38.895303Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2023-08-11T06:42:43.498293Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /opt/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/functions.sh')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gitolite@code.uber.internal'), PosixPath('lm/fievel')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('-Dorg.joda.time.DateTimeZone.Folder=/etc/tzdata/jodatime/data')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/_ubuild_npm_ls.out')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/secrets/pip.conf')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/ucontainer/_ubuild_pip_freeze.out')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/var/cache/udocker/ma-endpoint-pos/tmp/spark')} warn(msg) /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=1 2023-08-11T06:42:43.498333Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1 2023-08-11T06:42:43.595541Z ERROR text_generation_launcher: Shard 1 failed to start 2023-08-11T06:42:43.595554Z INFO text_generation_launcher: Shutting down shards

Narsil commented 1 year ago

What mode is that ? Some models have shapes incompatible with tp=4 (because their shapes are not divisible by 4). This doesnt' seem to be the case, the error would be clearer.

sigev 7 seems to be invalid access. Can you share the exact command you're lauching and all the details asked in the template ?

liyu-tan commented 1 year ago

I am not getting those errors (because their shapes are not divisible by 4). I am using llama2-70b and llama2-13b.

text-generation-launcher --model-id meta-llama/Llama-2-70b-chat-hf --max-input-length 8000 --max-batch-prefill-tokens 8000 --max-total-tokens 8100 --port 8081 --trust-remote-code --num-shard 4

Narsil commented 1 year ago

What card ?

Do you mind giving all the info propose in the template ?

Also I see

 2023-08-11T06:42:31.317013Z WARN text_generation_launcher: Unable to use Flash Attention V2: Flash Attention V2 is not installed.

Unless you're running on V100 or older this you should install flashv2 (cd server/ && make install-flash-attention-v2 )

liyu-tan commented 1 year ago

I am running in the A100, and I am not able to install the flash attention as well. Here is the error msg: /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/csrc/flash_attn/src/flash_fwd_launch_template.h: In lambda function: /home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/csrc/flash_attn/src/flash_fwd_launch_template.h:183:1249: error: lambda capture of ‘Is_dropout’ is not a constant expression BOOL_SWITCH(params.p_dropout < 1.f, Is_dropout, [&] { ^ ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/usr/bin/devx_python/python_home/3.9.2/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/text-generation-inference/server/flash-attention-v2/setup.py", line 201, in setup( File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/init.py", line 107, in setup return distutils.core.setup(**attrs) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command super().run_command(command) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 131, in run self.run_command(cmd_name) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command super().run_command(command) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions build_ext.build_extensions(self) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension _build_ext.build_extension(self, ext) File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension make: *** [Makefile-flash-att-v2:10: build-flash-attention-v2] Error 1

I install the ninja before

liyu-tan commented 1 year ago

Does anyone have this issue before?

Narsil commented 1 year ago

Do you guys have c++17 ? Most of the recent kernels require it (the compile error seem to indicate it's the C++ part that is failing about the constant part)

liyu-tan commented 1 year ago

g++ (Debian 8.3.0-6) 8.3.0

liyu-tan commented 1 year ago

I also figure it there is some issues before.

home/udocker/ma-endpoint-pos/data/michelangelo/scripts/TGI/tgivenv/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)

After resolving, I still have the issue to start a shard 4.


Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) or install flash attention v2 with `cd server && make install install-flash-attention-v2`

2023-08-23T23:41:22.314580Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2023-08-23T23:41:22.314580Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2023-08-23T23:41:22.314580Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-08-23T23:41:22.315185Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2023-08-23T23:41:27.118443Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=1
2023-08-23T23:41:27.118491Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=1
2023-08-23T23:41:27.215455Z ERROR text_generation_launcher: Shard 1 failed to start
2023-08-23T23:41:27.215472Z  INFO text_generation_launcher: Shutting down shards
2023-08-23T23:41:27.218010Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=2
2023-08-23T23:41:27.218027Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=2
2023-08-23T23:41:27.389135Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=3
2023-08-23T23:41:27.544059Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=0```
luefei commented 1 year ago

the same issue, but i used cmd with A10 4 cards text-generation-launcher --model-id Model/falcon-40b-instruct --sharded true --num-shard 4 --disable-custom-kernels --quantize bitsandbytes-nf4 --master-port 30000 --port 30001

the details ` 2023-08-24T03:32:01.146724Z INFO text_generation_launcher: Sharding model on 4 processes 2023-08-24T03:32:01.146791Z INFO download: text_generation_launcher: Starting download process. 2023-08-24T03:32:03.549527Z INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-08-24T03:32:03.949053Z INFO download: text_generation_launcher: Successfully downloaded weights. 2023-08-24T03:32:03.949203Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2023-08-24T03:32:03.949219Z INFO shard-manager: text_generation_launcher: Starting shard rank=1 2023-08-24T03:32:03.949253Z INFO shard-manager: text_generation_launcher: Starting shard rank=2 2023-08-24T03:32:03.949266Z INFO shard-manager: text_generation_launcher: Starting shard rank=3 2023-08-24T03:32:06.695108Z WARN text_generation_launcher: We're not using custom kernels.

2023-08-24T03:32:06.738100Z WARN text_generation_launcher: We're not using custom kernels.

2023-08-24T03:32:06.799434Z WARN text_generation_launcher: We're not using custom kernels.

2023-08-24T03:32:06.892935Z WARN text_generation_launcher: We're not using custom kernels.

2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-08-24T03:32:13.957145Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1 2023-08-24T03:32:15.958826Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=0 2023-08-24T03:32:15.958858Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 7 rank=0 2023-08-24T03:32:16.056799Z ERROR text_generation_launcher: Shard 0 failed to start 2023-08-24T03:32:16.056818Z INFO text_generation_launcher: Shutting down shards 2023-08-24T03:32:16.073862Z INFO shard-manager: text_generation_launcher: Shard terminated rank=2 2023-08-24T03:32:16.224323Z INFO shard-manager: text_generation_launcher: Shard terminated rank=1 2023-08-24T03:32:17.381476Z INFO shard-manager: text_generation_launcher: Shard terminated rank=3 Error: ShardCannotStart `

Narsil commented 1 year ago

--shm-size 1g is necessary for NCCL to see the cards.

luefei commented 1 year ago

how to set --shm-size in cmd with text-generation-launcher

Narsil commented 1 year ago

Oh please use the docker environment. Here something is wrong in your environment. Signal 7 means invalid access. It seems bitsandbytes is causing it on older cards.

luefei commented 1 year ago

yes, official docker image worked. But i need to use centos to build the environment, and now it's not work. Did have any other guidelines?

Narsil commented 1 year ago

No the README should cover most thing we also have a documentation now: https://huggingface.co/docs/text-generation-inference/index

swapnil3597 commented 8 months ago

@Narsil I'm facing similar issue while running using official docker. How do I set--shm-size 1g in case of docker? Please help with this case. Thanks. Issue Description: ISSUE #1458