li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4
MIT License
2.81k stars 327 forks source link

使用openai_api.py启动量化后的glm4模型报错 #313

Closed gabrielpondc closed 3 weeks ago

gabrielpondc commented 3 weeks ago

使用convert.py量化glm4的模型并且使用q_4量化后使用openai_api.py进行加载报错

(glm4) <glm4>ktgpu:.../chatglm.cpp/chatglm_cpp -> MODEL=../models/glm4_q4_0.bin uvicorn chatglm_cpp.openai_api:app --host 0.0.0.0 --port 9998

报错如下

ggml_new_object: not enough space in the context's memory pool (needed 1073742144, available 1076736)
Segmentation fault (core dumped)

运行主机情况如下

image

系统是Ubuntu 24.04 Python版本为3.10.14 系统架构为x86_64 安装的chatglm_cpp的包是3.3.0的版本

gabrielpondc commented 3 weeks ago

有llama-cpp-python的issue #585里面有类似的情况,那个作者让缩减一下max_ctx_size,但是我不知道如何在chatglm.cpp的项目中设置这个参数。

gabrielpondc commented 3 weeks ago

另外编译安装时候报错如下,是我编译安装时候出现什么错误了吗?

CMAKE_ARGS="-DGGML_CUBLAS=ON -DCUDA_ARCHITECTURES=89" pip install .
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Processing /2T/Langchain-Ch/Langchain-Chatchat/THUDM/glm-4-9b-chat-1m/glm-4-9b-chat-1m/chatglm.cpp
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: chatglm-cpp
  Building wheel for chatglm-cpp (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for chatglm-cpp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [179 lines of output]
      running bdist_wheel
      running build
      running build_py
      copying chatglm_cpp/langchain_api.py -> build/lib.linux-x86_64-cpython-310/chatglm_cpp
      copying chatglm_cpp/__init__.py -> build/lib.linux-x86_64-cpython-310/chatglm_cpp
      copying chatglm_cpp/openai_api.py -> build/lib.linux-x86_64-cpython-310/chatglm_cpp
      copying chatglm_cpp/convert.py -> build/lib.linux-x86_64-cpython-310/chatglm_cpp
      running egg_info
      writing chatglm_cpp.egg-info/PKG-INFO
      writing dependency_links to chatglm_cpp.egg-info/dependency_links.txt
      writing requirements to chatglm_cpp.egg-info/requires.txt
      writing top-level names to chatglm_cpp.egg-info/top_level.txt
      reading manifest file 'chatglm_cpp.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file 'chatglm_cpp.egg-info/SOURCES.txt'
      copying chatglm_cpp/_C.pyi -> build/lib.linux-x86_64-cpython-310/chatglm_cpp
      running build_ext
      CMake Deprecation Warning at third_party/ggml/CMakeLists.txt:1 (cmake_minimum_required):
        Compatibility with CMake < 3.5 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.

      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      -- Linux detected
      -- cuBLAS found
      CMake Error at /tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:814 (message):
        Compiling the CUDA compiler identification source file
        "CMakeCUDACompilerId.cu" failed.

        Compiler: /usr/local/cuda-12.2/bin/nvcc

        Build flags:

        Id flags: --keep;--keep-dir;tmp -v

        The output was:

        1

        #$ _NVVM_BRANCH_=nvvm

        #$ _SPACE_=

        #$ _CUDART_=cudart

        #$ _HERE_=/usr/local/cuda-12.2/bin

        #$ _THERE_=/usr/local/cuda-12.2/bin

        #$ _TARGET_SIZE_=

        #$ _TARGET_DIR_=

        #$ _TARGET_DIR_=targets/x86_64-linux

        #$ TOP=/usr/local/cuda-12.2/bin/..

        #$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-12.2/bin/../nvvm/libdevice

        #$
        LD_LIBRARY_PATH=/usr/local/cuda-12.2/bin/../lib:/home/kingtang/instantclient_23_4

        #$
        PATH=/usr/local/cuda-12.2/bin/../nvvm/bin:/usr/local/cuda-12.2/bin:/tmp/pip-build-env-tfud6jo_/overlay/bin:/tmp/pip-build-env-tfud6jo_/normal/bin:/usr/lib/code-server/lib/vscode/bin/remote-cli:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/envs/glm4/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/anaconda3/condabin:/root/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/lib/code-server/lib/vscode/bin/remote-cli:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/anaconda3/condabin:/root/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/code-server/lib/vscode/bin/remote-cli:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/.bun/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/kingtang/instantclient_23_4:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/usr/local/cuda-12.2/bin:/root/anaconda3/bin:/root/anaconda3/condabin:/root/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

        #$ INCLUDES="-I/usr/local/cuda-12.2/bin/../targets/x86_64-linux/include"

        #$ LIBRARIES=
        "-L/usr/local/cuda-12.2/bin/../targets/x86_64-linux/lib/stubs"
        "-L/usr/local/cuda-12.2/bin/../targets/x86_64-linux/lib"

        #$ CUDAFE_FLAGS=

        #$ PTXAS_FLAGS=

        #$ rm tmp/a_dlink.reg.c

        #$ gcc -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__
        "-I/usr/local/cuda-12.2/bin/../targets/x86_64-linux/include"
        -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=2
        -D__CUDACC_VER_BUILD__=91 -D__CUDA_API_VER_MAJOR__=12
        -D__CUDA_API_VER_MINOR__=2 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include
        "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o
        "tmp/CMakeCUDACompilerId.cpp4.ii"

        In file included from
        /usr/local/cuda-12.2/bin/../targets/x86_64-linux/include/cuda_runtime.h:82,

                         from <command-line>:

        /usr/local/cuda-12.2/bin/../targets/x86_64-linux/include/crt/host_config.h:136:2:
        error: #error -- unsupported GNU version! gcc versions later than 12 are
        not supported! The nvcc flag '-allow-unsupported-compiler' can be used to
        override this version check; however, using an unsupported host compiler
        may cause compilation failure or incorrect run time execution.  Use at your
        own risk.

          136 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
              |  ^~~~~

        # --error 0x1 --

      Call Stack (most recent call first):
        /tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
        /tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
        /tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCUDACompiler.cmake:131 (CMAKE_DETERMINE_COMPILER_ID)
        third_party/ggml/src/CMakeLists.txt:203 (enable_language)

      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/root/anaconda3/envs/glm4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/root/anaconda3/envs/glm4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/root/anaconda3/envs/glm4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 410, in build_wheel
          return self._build_with_temp_dir(
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 127, in <module>
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-tfud6jo_/normal/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-tfud6jo_/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
          self.build_extension(ext)
        File "<string>", line 120, in build_extension
        File "/root/anaconda3/envs/glm4/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '/2T/Langchain-Ch/Langchain-Chatchat/THUDM/glm-4-9b-chat-1m/glm-4-9b-chat-1m/chatglm.cpp', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/2T/Langchain-Ch/Langchain-Chatchat/THUDM/glm-4-9b-chat-1m/glm-4-9b-chat-1m/chatglm.cpp/build/lib.linux-x86_64-cpython-310/chatglm_cpp/', '-DPYTHON_EXECUTABLE=/root/anaconda3/envs/glm4/bin/python', '-DCMAKE_BUILD_TYPE=Release', '-DCHATGLM_ENABLE_PYBIND=ON', '-DCHATGLM_ENABLE_EXAMPLES=OFF', '-DGGML_CUBLAS=ON']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for chatglm-cpp
Failed to build chatglm-cpp
ERROR: Could not build wheels for chatglm-cpp, which is required to install pyproject.toml-based projects
li-plus commented 3 weeks ago
        /usr/local/cuda-12.2/bin/../targets/x86_64-linux/include/crt/host_config.h:136:2:
        error: #error -- unsupported GNU version! gcc versions later than 12 are
        not supported! The nvcc flag '-allow-unsupported-compiler' can be used to
        override this version check; however, using an unsupported host compiler
        may cause compilation failure or incorrect run time execution.  Use at your
        own risk.

          136 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
              |  ^~~~~

gcc 12 版本太新了,可能得降级一下。

ggml_new_object: not enough space in the context's memory pool (needed 1073742144, available 1076736)

句长是不是很长呢?

现在显存是预分配的,后面会升级到最新 ggml ,就不会有这个问题了

gabrielpondc commented 3 weeks ago

好的,我尝试一下降低GCC版本

ggml_new_object: not enough space in the context's memory pool (needed 1073742144, available 1076736)

模型支持的token长度很长但是想通过参数控制一下失败.后用128k的GLM4成功