Open rmivdc opened 1 year ago
I ran into this issue as well with torch==2.0. When I uninstalled it and re-installed as torch==1.13.1, then it seemed to fix the issue.
Thanks ! this version fixed it. EDIT : at least for cpu running, gpu running still throws that error
The error went away for me on GPU
The error went away for me on GPU
May i know what Cuda version are you using / nvidia drivers version and your :
accelerate appdirs bitsandbytes black black[jupyter] datasets fire gradio
pip packages versions ? (if not last one used)
thanks !
CUDA 11.7. Also I'm used conda for install pytorch with cuda (conda install pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
)
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
$ nvidia-smi Mon Mar 27 20:19:20 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:05:00.0 Off | 0 |
| N/A 29C P0 63W / 400W| 7429MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:06:00.0 Off | 0 |
| N/A 26C P0 63W / 400W| 7717MiB / 81920MiB | 0% Default |
$ pip list
Package Version Editable project location
------------------- ----------- ------------------------------
accelerate 0.18.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 4.2.2
anyio 3.6.2
appdirs 1.4.4
async-timeout 4.0.2
attrs 22.2.0
bitsandbytes 0.37.2
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
contourpy 1.0.7
cycler 0.11.0
datasets 2.10.1
deepspeed 0.8.3
defusedxml 0.7.1
dill 0.3.6
entrypoints 0.4
fastapi 0.95.0
ffmpy 0.3.0
filelock 3.10.6
fire 0.5.0
flit_core 3.8.0
fonttools 4.39.2
frozenlist 1.3.3
fsspec 2023.3.0
Glances 3.3.1.1
gradio 3.23.0
h11 0.14.0
hjson 3.1.0
httpcore 0.16.3
httptools 0.5.0
httpx 0.23.3
huggingface-hub 0.13.3
idna 3.4
importlib-resources 5.12.0
Jinja2 3.1.2
jmespath 1.0.1
jsonschema 4.17.3
kiwisolver 1.4.4
linkify-it-py 2.0.0
loralib 0.1.1
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
multidict 6.0.4
multiprocess 0.70.14
ninja 1.11.1
numpy 1.24.2
openai 0.27.2
orjson 3.8.8
packaging 23.0
pandas 1.5.3
peft 0.3.0.dev0 /home/fsuser/peft
Pillow 9.4.0
pip 23.0.1
psutil 5.9.4
py-cpuinfo 9.0.0
pyarrow 11.0.0
pydantic 1.10.7
pydub 0.25.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-dotenv 1.0.0
python-multipart 0.0.6
pytz 2023.2
PyYAML 6.0
regex 2023.3.23
requests 2.28.2
responses 0.18.0
rfc3986 1.5.0
semantic-version 2.10.0
sentencepiece 0.1.97
setuptools 65.6.3
six 1.16.0
sniffio 1.3.0
starlette 0.26.1
termcolor 2.2.0
tokenizers 0.13.2
toolz 0.12.0
torch 1.13.1
tqdm 4.65.0
transformers 4.28.0.dev0 /home/fsuser/transformers_main
typing_extensions 4.4.0
uc-micro-py 1.0.1
ujson 5.7.0
urllib3 1.26.15
uvicorn 0.21.1
uvloop 0.17.0
watchfiles 0.18.1
websockets 10.4
wheel 0.38.4
xxhash 3.2.0
yarl 1.8.2
zipp 3.15.0
CUDA 12 is not compatible with PyTorch 2.0.
https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix
Following is the Release Compatibility Matrix for PyTorch releases:
PyTorch version | Python | Stable CUDA | Experimental CUDA -- | -- | -- | -- 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 1.12 | >=3.7, <=3.10 | CUDA 11.3, CUDNN 8.3.2.44 | CUDA 11.6, CUDNN 8.3.2.44Also, Python 3.11 is not compatible either; the max version is 3.10.
Getting the same issue here trying to run inference on the google t5-xl model.
Error:
cuBLAS API failed with status 15
A: torch.Size([1, 2048]), B: torch.Size([2048, 2048]), C: (1, 2048); (lda, ldb, ldc): (c_int(32), c_int(65536), c_int(32)); (m, n, k): (c_int(1), c_int(2048), c_int(2048))
...
File "/home/mau/.conda/envs/test/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/mau/.conda/envs/test/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
I've tried all the fixes proposed here but no luck.
Environment packages:
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
accelerate 0.18.0 pypi_0 pypi bitsandbytes 0.37.2 pypi_0 pypi blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.01.10 h06a4308_0 anaconda certifi 2022.12.7 py39h06a4308_0 anaconda cffi 1.15.1 py39h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
cryptography 39.0.1 py39h9ce1e76_0
cuda-cudart 11.7.99 0 nvidia cuda-cupti 11.7.101 0 nvidia cuda-libraries 11.7.1 0 nvidia cuda-nvrtc 11.7.99 0 nvidia cuda-nvtx 11.7.91 0 nvidia cuda-runtime 11.7.1 0 nvidia cudatoolkit 11.3.1 h2bc3f7f_2 anaconda ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.10.7 pypi_0 pypi flit-core 3.8.0 py39h06a4308_0
freetype 2.12.1 h4a9f257_0
giflib 5.2.1 h5eee18b_3
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
huggingface-hub 0.13.3 pypi_0 pypi idna 3.4 py39h06a4308_0
intel-openmp 2021.4.0 h06a4308_3561
jpeg 9e h5eee18b_1
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 11.10.3.66 0 nvidia libcufft 10.7.2.124 h4fbf590_0 nvidia libcufile 1.6.0.25 0 nvidia libcurand 10.3.2.56 0 nvidia libcusolver 11.4.0.1 0 nvidia libcusparse 11.7.4.91 0 nvidia libdeflate 1.17 h5eee18b_0
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.2 h7f8727e_0
libnpp 11.7.4.75 0 nvidia libnvjpeg 11.8.0.2 0 nvidia libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.16.0 h27cfd23_0
libtiff 4.5.0 h6a678d5_2
libunistring 0.9.10 h27cfd23_0
libwebp 1.2.4 h11a3e52_1
libwebp-base 1.2.4 h5eee18b_1
lz4-c 1.9.4 h6a678d5_0
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
numpy 1.23.5 py39h14f4228_0
numpy-base 1.23.5 py39h31eccc5_0
openh264 2.1.1 h4ff587b_0
openssl 1.1.1t h7f8727e_0
packaging 23.0 pypi_0 pypi pillow 9.4.0 py39h6a678d5_0
pip 23.0.1 py39h06a4308_0
psutil 5.9.4 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.0.0 py39h06a4308_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.16 h7a1cb2a_2
pytorch 1.13.1 py3.9_cuda11.7_cudnn8.5.0_0 pytorch pytorch-cuda 11.7 h778d358_3 pytorch pytorch-mutex 1.0 cuda pytorch pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0
regex 2023.3.23 pypi_0 pypi requests 2.28.1 py39h06a4308_1
sentencepiece 0.1.97 pypi_0 pypi setuptools 65.6.3 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.1 h5eee18b_0
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.2 pypi_0 pypi torchaudio 0.13.1 py39_cu117 pytorch torchvision 0.14.1 py39_cu117 pytorch tqdm 4.65.0 pypi_0 pypi transformers 4.28.0.dev0 pypi_0 pypi typing_extensions 4.4.0 py39h06a4308_0
tzdata 2022g h04d1e81_0
urllib3 1.26.15 py39h06a4308_0
wheel 0.38.4 py39h06a4308_0
xz 5.2.10 h5eee18b_1
zlib 1.2.13 h5eee18b_0
zstd 1.5.4 hc292b87_0
@mudomau Do you have the same issue with "decapoda-research/llama-7b-hf" ?
I'm encountering another error now but the last Dockerfile install uploaded 3 days ago fixed that cuBLAS error for me.
same problem here.
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
A: torch.Size([5120, 4096]), B: torch.Size([4096, 4096]), C: (5120, 4096); (lda, ldb, ldc): (c_int(163840), c_int(131072), c_int(163840)); (m, n, k): (c_int(5120), c_int(4096), c_int(4096))
cuBLAS API failed with status 15
error detected
$ nvidia-smi
Tue Apr 11 21:25:11 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:08:00.0 On | N/A |
| 0% 53C P8 18W / 220W | 1020MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1292 G /usr/lib/xorg/Xorg 460MiB |
| 0 N/A N/A 1577 G /usr/bin/gnome-shell 172MiB |
| 0 N/A N/A 3884 G ...RendererForSitePerProcess 86MiB |
| 0 N/A N/A 5441 G ...983706979455292193,131072 249MiB |
+-----------------------------------------------------------------------------+
$ /usr/local/cuda-11.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0
$ pip list
Package Version
------------------------ -------------
accelerate 0.18.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 4.2.2
anyio 3.6.2
appdirs 1.4.4
apturl 0.5.2
asttokens 2.2.1
async-timeout 4.0.2
attrs 22.2.0
backcall 0.2.0
bitsandbytes 0.37.2
black 23.3.0
blinker 1.4
Brlapi 0.8.3
certifi 2020.6.20
chardet 4.0.0
charset-normalizer 3.1.0
click 8.0.3
cmake 3.26.3
colorama 0.4.4
command-not-found 0.3
contourpy 1.0.7
cryptography 3.4.8
cupshelpers 1.0
cycler 0.11.0
datasets 2.11.0
dbus-python 1.2.18
decorator 5.1.1
defer 1.0.6
dill 0.3.6
distro 1.7.0
distro-info 1.1build1
entrypoints 0.4
executing 1.2.0
fastapi 0.95.0
ffmpy 0.3.0
filelock 3.11.0
fire 0.5.0
fonttools 4.39.3
frozenlist 1.3.3
fsspec 2023.4.0
GPUtil 1.4.0
gradio 3.25.0
gradio_client 0.0.10
h11 0.14.0
httpcore 0.17.0
httplib2 0.20.2
httpx 0.24.0
huggingface-hub 0.13.4
idna 3.3
importlib-metadata 4.6.4
ipython 8.12.0
jedi 0.18.2
jeepney 0.7.1
Jinja2 3.1.2
jsonschema 4.17.3
keyring 23.5.0
kiwisolver 1.4.4
language-selector 0.1
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
linkify-it-py 2.0.0
lit 16.0.1
llvmlite 0.39.1
loralib 0.1.1
louis 3.20.0
macaroonbakery 1.3.1
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdurl 0.1.2
more-itertools 8.10.0
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
mypy-extensions 1.0.0
netifaces 0.11.0
networkx 3.1
numba 0.56.4
numpy 1.23.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.0
olefile 0.46
orjson 3.8.10
packaging 23.0
pandas 2.0.0
parso 0.8.3
pathspec 0.11.1
peft 0.3.0.dev0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.0.1
pip 22.0.2
platformdirs 3.2.0
prompt-toolkit 3.0.38
protobuf 3.12.4
psutil 5.9.4
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 11.0.0
pycairo 1.20.1
pycups 2.0.1
pydantic 1.10.7
pydub 0.25.1
Pygments 2.15.0
PyGObject 3.42.1
PyJWT 2.3.0
pymacaroons 0.13.0
PyNaCl 1.5.0
pynvml 11.5.0
pyparsing 2.4.7
pyRFC3339 1.1
pyrsistent 0.19.3
python-apt 2.4.0+ubuntu1
python-dateutil 2.8.2
python-debian 0.1.43ubuntu1
python-multipart 0.0.6
pytz 2022.1
pyxdg 0.27
PyYAML 5.4.1
regex 2023.3.23
reportlab 3.6.8
requests 2.25.1
responses 0.18.0
rich 13.3.3
screen-resolution-extra 0.0.0
SecretStorage 3.3.1
semantic-version 2.10.0
sentencepiece 0.1.97
setuptools 59.6.0
six 1.16.0
sniffio 1.3.0
stack-data 0.6.2
starlette 0.26.1
sympy 1.11.1
systemd-python 234
termcolor 2.2.0
tokenize-rt 5.0.0
tokenizers 0.13.3
tomli 2.0.1
toolz 0.12.0
torch 1.13.1+cu116
torchaudio 0.13.1+cu116
torchvision 0.14.1+cu116
tqdm 4.65.0
traitlets 5.9.0
transformers 4.28.0.dev0
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
ubuntu-advantage-tools 8001
ubuntu-drivers-common 0.0.0
uc-micro-py 1.0.1
ufw 0.36.1
unattended-upgrades 0.1
urllib3 1.26.5
uvicorn 0.21.1
wadllib 1.3.6
wcwidth 0.2.6
websockets 11.0.1
wheel 0.37.1
xdg 5
xkit 0.0.0
xxhash 3.2.0
yarl 1.8.2
zipp 1.0.0
I am running into the same issue as well on a H100:
torch 1.13.1, bitsandbytes==0.38.1, cuda 11.8, python 3.10, cublas 11.11.3.6
result = super().forward(x)
File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 320, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1436, in igemmlt
raise Exception('cublasLt ran into an error!')
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Same issue comes to me when finetuning 30b and 65b models, even on different clouds.
For 65b model, it randomly occurs with a probability of about 70%. For 30b model, it occurs every time.
I am running into the same issue as well on a H100:
torch 1.13.1, bitsandbytes==0.38.1, cuda 11.8, python 3.10, cublas 11.11.3.6
result = super().forward(x) File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 320, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB) File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1436, in igemmlt raise Exception('cublasLt ran into an error!')
> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
@arvindsun Have you fixed this? I'm also running into this issue when using an H100 on Lambda Labs.
Getting the same error on an H100 on Lambda Labs
Getting the same error on an H100 on Lambda Labs too
Getting the same error on an H100 on Lambda Labs too
Try to run it w/o 8-bit mode since you are on H100
Getting the same error on an H100 on Lambda Labs too
Try to run it w/o 8-bit mode since you are on H100
I tried it.
Lambda instances of H100 has cuda 11.8, and pytorch 2.0.1 compiled to 117, which is not compatible. the bitsandbytes version also has a problem, and you need to rename the cuda version you are using.
I tried to install cuda version 12 too, to use the latest version of torch, but strangely the installation is aborted, without fail, so I gave up on testing it on the H100, I had already spent 3h of my time trying to configure it. I'll try it on another runpod instance, as locally I could successfully train it with 3 epochs, but I needed more computation to train it with 10, my RTX4090 will take weeks for it.
Facing the same error on lambda labds H100 instance trying to load Falcon-40B in 8 bit, what's the solution?
export this variables:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Install the compatible cuda (11.7 hasn't support to H100):
sudo apt install cuda-nvcc-11-8 libcusparse-11-8 libcusparse-dev-11-8 libcublas-dev-11-8 libcublas-11-8 libcusolver-dev-11-8 libcusolver-11-8
Remove old cuda:
apt remove cuda-nvcc-11-7
Install the compatible pytorch:
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu118
pip install pytorch-lightning==1.9.0
If you will use deepspeed to make CPU offload (it makes the train faster) you need:
pip install deepspeed==0.7.0
Edit these files (using VIM, nano, or SFPT) changing the import for inf from torch._six
with import from math
/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/utils.py
/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py
Facing the same error on lambda labds H100 instance trying to load Falcon-40B in 8 bit, what's the solution?
Ended up moving back to an A100 😅
Has anyone else tried and confirmed the efficacy of @jonataslaw's solution two comments above? Will test myself over the weekend.
I was able to solve this error with the conda install approach found here: https://github.com/TimDettmers/bitsandbytes/issues/85
# jupyter setup
wget http://repo.continuum.io/archive/Anaconda3-2023.03-1-Linux-x86_64.sh
bash Anaconda3-2023.03-1-Linux-x86_64.sh
source ~/.bashrc
conda create --name cap
conda activate cap
conda install pip
conda install cudatoolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
git clone https://github.com/timdettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x
python setup.py install
pip install scipy
python -m bitsandbytes
# should be successfull build
I met this issue on H100 GPU, and fixed it by changing load_in_8bit=True
to load_in_8bit=False
in the 114-th line of finetune.py.
@daniel-furman
I was able to solve this error with the conda install approach found here: TimDettmers/bitsandbytes#85
# jupyter setup wget http://repo.continuum.io/archive/Anaconda3-2023.03-1-Linux-x86_64.sh bash Anaconda3-2023.03-1-Linux-x86_64.sh source ~/.bashrc conda create --name cap conda activate cap conda install pip conda install cudatoolkit conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH git clone https://github.com/timdettmers/bitsandbytes.git cd bitsandbytes CUDA_VERSION=118 make cuda11x python setup.py install pip install scipy python -m bitsandbytes # should be successfull build
Sadly it gave me the below error
Downloading (…)fetensors.index.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36.2k/36.2k [00:00<00:00, 10.6MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.96G/9.96G [03:00<00:00, 55.3MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.86G/9.86G [02:57<00:00, 55.4MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.86G/9.86G [02:57<00:00, 55.4MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36G/1.36G [00:24<00:00, 55.2MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [09:22<00:00, 140.63s/it]
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/ubuntu/miniconda/envs/starchat/lib/libcudart.so'), PosixPath('/home/ubuntu/miniconda/envs/starchat/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/ubuntu/miniconda/envs/starchat/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 9.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]Error named symbol not found at line 528 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
Got this issue on H100 on runpod
same got this on H100 with 8bit. H100 works with 16bits
Got this error on H100 using 8bit Llama. If anyone can make it on H100?
Got this error on H100 using 8bit Llama. If anyone can make it on H100?
You can avoid to use 8 bit. 4bit and 16bit are fine.
Hi, During the finetune.py command launch i'm encoutering this error titled above. i'm using Fedora 36 with Cuda12, Python 3.10.10, initializing seems begining like so :
and then later after loading some files :
am i using some wrong libs versions ? thx for your help