huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
9.11k stars 1.07k forks source link

local installation fail #2322

Open ragesh2000 opened 3 months ago

ragesh2000 commented 3 months ago

System Info

Iam refferng to https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#local-install to install TGI locally. But iam keep on getting an error related to vllm

RuntimeError: Cannot find CMake executable
make[1]: *** [Makefile-vllm:5: build-vllm-cuda] Error 1
make[1]: Leaving directory '/home/gpu/path/llm/text-generation-inference/server'
make: *** [Makefile:2: install-server] Error 2

Information

Tasks

Reproduction

git clone https://github.com/huggingface/text-generation-inference.git

cd text-generation-inference/ BUILD_EXTENSIONS=True make install

Expected behavior

To complete build

ErikKaum commented 3 months ago

Hi @ragesh2000 👋

Thanks for reporting this! Could you give a bit more info on your system? It seems like you might not have cmake installed?

ragesh2000 commented 3 months ago

@ErikKaum I have cmake in my system with version cmake version 3.22.1 Here is the complete traceback

-- Enabling C extension.
-- Enabling moe extension.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311
gmake[2]: Entering directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[3]: Entering directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[4]: Entering directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[5]: Entering directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[5]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[5]: Entering directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
[ 33%] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/moe_ops.cpp.o
[ 66%] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
gmake[5]: *** [CMakeFiles/_moe_C.dir/build.make:90: CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o] Error 1
gmake[5]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[4]: *** [CMakeFiles/Makefile2:112: CMakeFiles/_moe_C.dir/all] Error 2
gmake[4]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[3]: *** [CMakeFiles/Makefile2:119: CMakeFiles/_moe_C.dir/rule] Error 2
gmake[3]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
gmake[2]: *** [Makefile:182: _moe_C] Error 2
gmake[2]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/build/temp.linux-x86_64-cpython-311'
Traceback (most recent call last):
  File "/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/setup.py", line 383, in <module>
    setup(
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/__init__.py", line 108, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
    dist.run_commands()
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 970, in run_commands
    self.run_command(cmd)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/dist.py", line 956, in run_command
    super().run_command(command)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
    cmd_obj.run()
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/dist.py", line 956, in run_command
    super().run_command(command)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
    cmd_obj.run()
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 93, in run
    _build_ext.run(self)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
    self.build_extensions()
  File "/home/gpu/ai/llm/quantise/text-generation-inference/server/vllm/setup.py", line 188, in build_extensions
    subprocess.check_call(['cmake', *build_args], cwd=self.build_temp)
  File "/home/gpu/miniconda3/envs/tgi/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '64']' returned non-zero exit status 2.
make[1]: *** [Makefile-vllm:5: build-vllm-cuda] Error 1
make[1]: Leaving directory '/home/gpu/ai/llm/quantise/text-generation-inference/server'
make: *** [Makefile:2: install-server] Error 2
ErikKaum commented 3 months ago

Ah okay okay, gotcha. Thank you for the full stack trace 👍

So your getting a subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '64']' returned non-zero exit status 2. And I think unix return status 2 means invalid shell command.

One way of getting a better understanding would be to run that command in the terminal directly to see why it's invalid.

ragesh2000 commented 3 months ago

The result of cmake --build . --target _moe_C -j 64 was Error: could not load cache @ErikKaum

ErikKaum commented 3 months ago

hmm, seems still like a cmake error: https://stackoverflow.com/questions/16319292/cmake-error-could-not-load-cache

btw, do you specifically need to build TGI form source? In general, if you want to run it using the dockerized verison is easier to get going.

ragesh2000 commented 3 months ago

Actually I am running the whole thing in a GPU docker. So I think running a docker inside a docker may have some conflicts. @ErikKaum

ErikKaum commented 3 months ago

Yeah that for sure doesn't make things easier!