Open cikkle opened 2 weeks ago
Which version were you on before updating? Since there's no stack trace, it would help if it could be narrowed down to the latest updates.
Also, since I guess Torch SDPA is the likely culprit, what version of PyTorch are you using? Is it a nightly build?
I'm not able to answer about the version of exllama well, unfortunately. I tend to pull and reinstall it nearly every other time I restart tabbyapi. I just tried reverting to a commit of tabbyapi that was on 0.1.0 and still encountered the same problem
Nightly pytorch doesn't really work at all for me in any case, so I'm on the stable rocm version, installed via the command given on their site:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
When you reverted to the earlier Tabby, did it also actually revert to exllamav2==0.1.0 or did it keep the installed 0.1.5?
I uninstalled my existing version of exllama beforehand, the console logging reported 0.1.0.
Okay, I've committed a potential fix to the dev branch. Are you able to test it?
The dev branch specifically isn't building for me with pip install .
; I can't tell from the error where it's getting stuck.
[29/43] /opt/rocm/bin/hipcc -I/home/o0/.local/lib/python3.10/site-packages/torch/include -I/home/o0/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/o0/.local/lib/python3.10/site-packages/torch/include/TH -I/home/o0/.local/lib/python3.10/site-packages/torch/include/THC -I/home/o0/.local/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/usr/include/python3.10 -c -c /home/o0/ai/exllamav2/exllamav2/exllamav2_ext/hip/comp_units/unit_exl2_2b.hip -o /home/o0/ai/exllamav2/build/temp.linux-x86_64-3.10/exllamav2/exllamav2_ext/hip/comp_units/unit_exl2_2b.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -lineinfo -O3 -DHIPBLAS_USE_HIP_HALF -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=exllamav2_ext -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx942 -fno-gpu-rdc -std=c++17
clang: warning: -lineinfo: 'linker' input unused [-Wunused-command-line-argument]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/o0/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/o0/ai/exllamav2/setup.py", line 92, in <module>
setup(
File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 68, in run
return orig.install.run(self)
File "/usr/lib/python3.10/distutils/command/install.py", line 619, in run
self.run_command('build')
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/o0/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
build_ext.build_extensions(self)
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
objects = self.compiler.compile(sources,
File "/home/o0/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/home/o0/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/home/o0/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> exllamav2
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Full output: exllamav2_build.log
Apparently I used a couple of intrinsics not supported by HIP. I pushed a commit with fallback definitions, so it should compile again.
I'm using exllama through tabbyapi and I've been getting a floating point exception when continuing existing chats since a week or two ago. After experimenting I noticed this only happened around the 2K mark and I found in tabbyAPI's default config that chunk_size defaults to 2048, so I tried uncommenting it and setting it to 4096. Sure enough, I now get the exception past 4096 context instead.
I've tried using different models, playing with the cache mode, context size, gpu split parameters, deleting and cloning the repos again and reinstalling python dependencies clean, wondering if something else factors into this but nothing else seems to affect this.
TabbyAPI console output (command-r with 8192 max_seq_len and the default 2048 chunk size):
At the time of writing this I'm using the latest commits of tabbyapi and exllama (exllama is installed from source) Ubuntu 22.04.4 LTS Python 3.10.12 ROCm 6.1.2 (though for what it's worth I've encountered someone having the same issue with a pair of 3090s).