johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

Unable to Build Wheels #144

Closed VegaStarlake closed 1 year ago

VegaStarlake commented 1 year ago

I can't get the monkeypatch to install no matter what I try. It seems to be a cuda issue during the wheel building process. I'm on Win10, nvidia gpu, amd cpu. The following error might be from a docker image attempted build, but the error is almost identical to others so I feel like it's the same overall problem. The main issue I'm getting is

3 errors detected in the compilation of "src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu".
130.1 error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
130.1 [end of output]
130.1
130.1 note: This error originates from a subprocess, and is likely not a problem with pip.
130.1 ERROR: Failed building wheel for alpaca-lora-4bit
130.1 Running setup.py clean for alpaca-lora-4bit
132.6 Failed to build alpaca-lora-4bit
132.6 ERROR: Could not build wheels for alpaca-lora-4bit, which is required to install pyproject.toml-based projects

At first I struggled to pip install monkeypatch. Read that it was a pip3 issue. Tried a different setup.py file, but that didn't work.

Went back to the https://github.com/johnsmith0031/alpaca_lora_4bit/tree/winglian-setup_pip version. Followed all the steps. Failed at "pip install .". I tried it in the /repositories/ folder and in the miniconda environment (installer files). I tried the docker image instructions but the docker file is looking for a version of cuda that doesn't exist on Docker Hub. I tried changing the docker file to the most similar version available on docker hub ( FROM nvidia/cuda:11.7.0-devel-ubuntu22.04 AS builder -> cuda:11.7.1-devel-ubuntu22.04 repeat for all 3 instances of 11.7.0 in the file). I have CUDA 12.x but I also tried installing 11.7, setting Path and CUDA_PATH to the 11.7 folder/bin folder.

I should also say, when I used a different repo that claimed to fix the monkeypatch install problem I got a different error " File "H:\...\modules\training.py", line 275, in do_train from monkeypatch.peft_tuners_lora_monkey_patch import ( ModuleNotFoundError: No module named 'monkeypatch.peft_tuners_lora_monkey_patch'; 'monkeypatch' is not a package". I don't remember if it built the wheels correctly or not, but I remember being happy that it seemed to get past that step. With this error I also tried adding the monkeypatch folder to the path variables and even to one of the webui files so it could find it but nothing worked so I gave up and tried the johnsmith version again.

I do have multiple python environments split on different drives (one is in the default windows location) but I believe they're all added to the path variables properly and nothing is broken with regular webui, it's just this step of the monkeypatch/lora install. I've also tried moving the 11.7 to the top of the path and cuda_path variables but that didn't change the error. I also tried setting system memory or whatever because that was a problem with webui being on a different drive. I don't know if that's related to this at all. A Reddit comment mentioned precompiled wheels being available, but I can't find them anywhere.

I tried searching the kernal.cu file for "/usr/local/cuda/bin/nvcc" because that isn't the path of nvcc that I added to the path, but that string isn't in that file, and it's also not in setup.py so I have no idea where it could be or if that would even help. I have more/longer error messages but this post is already very long.

I appreciate any help anyone can give. I'm doing my best, but I've never formally learned python. If more information is needed I can try to provided as much as possible. Thank you.

johnsmith0031 commented 1 year ago

Maybe you can try WSL2 in win10, which works fine. Also, I think you're using the wrong version of the finetune.py, the finetune.py in pip installable version would be:

from alpaca_lora_4bit.monkeypatch.... import ....

You can follow some tutorial like this: https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/

And install cuda for WSL2 from here: https://docs.nvidia.com/cuda/wsl-user-guide/index.html https://developer.nvidia.com/cuda-11-7-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local

If cudnn is needed you can also install it, just like in normal ubuntu.

VegaStarlake commented 1 year ago

Thank you for the help. Is the wsl2 just for compiling with the docker image? I got more problems with wsl2, but here’s the other errors I got when trying to build wheels without the docker, truncated for space. I tried to include the beginning and end of the error blocks:

Building wheel for alpaca-lora-4bit (setup.py) ... error error: subprocess-exited-with-error 

× python setup.py bdist_wheel did not run successfully. 
│ exit code: 1 
╰─> [1438 lines of output] 
running bdist_wheel 
running build 
running build_py 
running build_ext 
V:\...\env\lib\site-packages\torch\utils\cpp_extension.py:359: 
UserWarning: Error checking compiler version for cl: [WinError 2] 
The system cannot find the file specified 
warnings.warn(f'Error checking compiler version for {compiler}: {error}') 
building 'alpaca_lora_4bit.quant_cuda' extension 
creating V:\...\repositories\alpaca_lora_4bit\build\temp.win-amd64-cpython-310 
creating V:\...\repositories\alpaca_lora_4bit\build\temp.win-amd64-cpython-310\Release 

…

[2/2] V:\...\env\bin\nvcc --generate-dependencies-with-compile --dependency-output 
V:\...\repositories\alpaca_lora_4bit\build\temp.win-amd64-cpython-310\Release\src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc 
-Xcudafe 
--diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface 
-Xcudafe 

…

3 errors detected in the compilation of "V:/.../repositories/alpaca_lora_4bit/src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu". quant_cuda_kernel.cu 
ninja: build stopped: subcommand failed. 
Traceback (most recent call last): 
File "V:\...\env\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( 
File "V:\...\env\lib\subprocess.py", line 526, in run 
raise CalledProcessError(retcode, process.args, 
subprocess.CalledProcessError: Command '['ninja', '-v']' 
returned non-zero exit status 1. 

The above exception was the direct cause of the following exception: 
Traceback (most recent call last): 
File "<string>", line 2, in <module> 
File "<pip-setuptools-caller>", line 34, in <module> 
File "V:\...\repositories\alpaca_lora_4bit\setup.py", line 19, in <module> 
setup( 
File "V:\...\env\lib\site-packages\setuptools\__init__.py", line 107, in setup 
return distutils.core.setup(**attrs) 
File "V:\...\env\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup 

…

 File "V:\...\env\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. 
ERROR: Failed building wheel for alpaca-lora-4bit Running 
setup.py clean for alpaca-lora-4bit 
Failed to build alpaca-lora-4bit 
ERROR: Could not build wheels for alpaca-lora-4bit, which is required to install pyproject.toml-based projects 

(V:\...\env) V:\...\repositories\alpaca_lora_4bit> 

I’m confused because it just seems to work for everyone else and nothing I try seems to work.

EDIT: I may have fixed the wsl2 problem so I'll continue with the rest of the steps for installing it, but if these errors have different/easier fixes please let me know!

VegaStarlake commented 1 year ago

from alpaca_lora_4bit.monkeypatch.... import

Where do I get the correct finetune.py file? When I checked the file I have it seems to be the exact same as the one listed on the main page here on github.

After researching more about the errors I felt like it was an issue with pytorch and cuda, so I created a new environment to install pytorch and cuda, activated it using "conda activate", changed to the alpaca directory and tried setup again. I read that combining conda and pip could cause problems, but the guides told me to use conda to activate the environment so I don't know if that's an issue.

Then I got an error saying ninja was missing so I pip installed that. I also updated wheel.

Then I got an error about C++ so I reinstalled the desktop C++ checking all the boxes on visual studio, added the folder to PATH system variable, then reactivated the env and tried setup again. Now the errors are harder to understand so I don't even know what to post.

I believe it could be pytorch compiled with 11.8 and cuda 11.7 was installed (I don't know how that's possible, maybe I need to add the venv cuda path as well?) Doing these steps fixed a different install for the tts webui so I was hoping it would fix this too, but I'm still lost. Maybe I could activate the textgen (or alpaca_lora?) environment to reinstall ninja and the rest? Are there specific cuda, python, pytorch, c++, etc. versions that are necessary to build the wheels? I feel like these would be installed during setup but maybe my path variables are mismatching what the setup is looking for, I really don't know.

I haven't tried the docker again because if there are more errors I know even less about *nix systems so I will probably be even worse off. If that's my only option I'll do try it. Is there any way to get pre-compiled wheels/would they even work?

I read that I could try to do python setup.py build_ext --inplace and then ninja -v in the build folder to get "more detailed" errors but the error output is already 5,000 words long. Would this be helpful and or would it cause problems?

johnsmith0031 commented 1 year ago

The file is here: https://github.com/johnsmith0031/alpaca_lora_4bit/blob/winglian-setup_pip/finetune.py

And I did not have those compilation issue on windows. Not sure what is causing your problem, but I believe using WSL2 would work.

VegaStarlake commented 1 year ago

I got miniconda3 and all the requirements set up in wsl2, but it's still not working. I notice attomicadd() is in there. My gpu can compute up to 8.6 which according to nvidia means I can't compute "atomic addition operating on float2 and float4 floating point vectors in global memory" no clue if this is related but here's a short description of the error I got in wsl2:

home/.../lora/alpaca_lora_4bit/src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu(974): error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (__half *, c10::Half) detected during instantiation of "void VecQuant4MatMulKernelFaster(const half2 *, const int *, scalar_t *, const scalar_t *, const int *, const int *, int, int, int, int, int) [with scalar_t=c10::Half]" (895): here

  /home/.../lora/alpaca_lora_4bit/src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu(1301): error: no instance of overloaded function "atomicAdd" matches the argument list
              argument types are: (__half *, c10::Half)
            detected during instantiation of "void VecQuant4MatMulV1KernelFaster(const half2 *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=c10::Half]"
  (1323): here

  /home/.../lora/alpaca_lora_4bit/src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu(1566): error: no instance of overloaded function "atomicAdd" matches the argument list
              argument types are: (__half *, c10::Half)
            detected during instantiation of "void VecQuant4MatMulKernel_G(const half2 *, const int *, scalar_t *, const scalar_t *, const int *, const int *, int, int, int, int, int) [with scalar_t=c10::Half]"
  (1473): here

  3 errors detected in the compilation of "/home/.../lora/alpaca_lora_4bit`

it seems to be related to line 974 atomicAdd(&mul2[b * width + w], res);

This was a proposed fix

__device__ __forceinline__ void atomicAdd(__half* address, __half val) { unsigned int *address_as_ui = reinterpret_cast<unsigned int *>(reinterpret_cast<char *>(address) - (reinterpret_cast<size_t>(address) & 2)); unsigned int old = *address_as_ui; unsigned int assumed, new_val; do { assumed = old; new_val = (assumed & 0xffff0000) | (__half_as_ushort(val) + (assumed & 0xffff)); old = atomicCAS(address_as_ui, assumed, new_val); } while (assumed != old); }

I do have a second gpu in my system, but it's not gpu0. I can try taking that out and running it again as well. Any thoughts?

johnsmith0031 commented 1 year ago

It's the compatibility issue with old video card and driver. You can remove VecQuant4MatMulKernel_G function in quant_cuda.cpp and quant_cuda_kernel.cu and then try again. This function is only used for inference.

VegaStarlake commented 1 year ago

After removing the old Pascal card wheels successfully build/installed on both wsl2 and win10! Thank you very much for your help. I haven't tested the changes you suggested with the old gpu in the system, would that be helpful to confirm your suggested changes?

I only had the old card in the system for extra vram but idek if it was helping with any of the applications.

johnsmith0031 commented 1 year ago

I think if you don't need additional vram, it's actually enough for now.