I had 30GB RAM, and I used 2MBx13000 swapfiles with the following command
: sudo dd if=/dev/zero of=/swapfile bs=2M count=13000 status=progress
Allocating transformer on host
Loading checkpoint 0
Loading checkpoint 1
Loaded in 2590.17 seconds with 13.19 GiB
cuBLAS API failed with status 15
A: torch.Size([72, 5120]), B: torch.Size([5120, 5120]), C: (72, 5120); (lda, ldb, ldc): (c_int(2304), c_int(163840), c_int(2304)); (m, n, k): (c_int(72), c_int(5120), c_int(5120))
error detectedTraceback (most recent call last):
File "/home/jupyter/llama-int8/example.py", line 117, in <module>
fire.Fire(main)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/jupyter/llama-int8/example.py", line 107, in main
results = generator.generate(
File "/home/jupyter/llama-int8/llama/generation.py", line 42, in generate
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/jupyter/llama-int8/llama/model.py", line 281, in forward
h = layer(h, start_pos, freqs_cis, mask)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jupyter/llama-int8/llama/model.py", line 221, in forward
h = x + self.attention.forward(
File "/home/jupyter/llama-int8/llama/model.py", line 142, in forward
xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
any clues?
I had 30GB RAM, and I used 2MBx13000 swapfiles with the following command :
sudo dd if=/dev/zero of=/swapfile bs=2M count=13000 status=progress