oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.24k stars 5.28k forks source link

cuda out of memmory #4557

Closed PGTBoos closed 9 months ago

PGTBoos commented 11 months ago

Describe the bug

in the console you get this error torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.47 GiB is allocated by PyTorch, and 759.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The strange thing is though that for a while a conversation works and then this apears. Also notice the unallocated space, i dont know enough of pytorch to fix it. I only run 7B models wit the 4bit and 8bit settings they should fit (well they do upon start but eventually i get this error). This with every model, it doesnt matter which one.

Is there an existing issue for this?

Reproduction

Well I just talk for a while and it randomly happens its not after x characters or x conversations it can happen after the seccond reply or after the 20 reply.

Screenshot

the chat result in empty replies (falls in repeat of empty).

Logs

2023-11-10 23:29:26 INFO:Loading the extension "gallery"...
2023-11-10 23:29:26 INFO:Loading the extension "character_bias"...
2023-11-10 23:29:26 INFO:Loading the extension "perplexity_colors"...
2023-11-10 23:29:26 INFO:Loading the extension "silero_tts"...
2023-11-10 23:29:26 ERROR:Failed to load the extension "silero_tts".
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\extensions.py", line 36, in load_extensions
    exec(f"import extensions.{name}.script")
  File "<string>", line 1, in <module>
  File "C:\web\text-generation-webui\extensions\silero_tts\script.py", line 10, in <module>
    from extensions.silero_tts import tts_preprocessor
  File "C:\web\text-generation-webui\extensions\silero_tts\tts_preprocessor.py", line 3, in <module>
    from num2words import num2words
ModuleNotFoundError: No module named 'num2words'
2023-11-10 23:29:26 INFO:Loading the extension "elevenlabs_tts"...
2023-11-10 23:29:26 ERROR:Failed to load the extension "elevenlabs_tts".
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\extensions.py", line 36, in load_extensions
    exec(f"import extensions.{name}.script")
  File "<string>", line 1, in <module>
  File "C:\web\text-generation-webui\extensions\elevenlabs_tts\script.py", line 5, in <module>
    import elevenlabs
ModuleNotFoundError: No module named 'elevenlabs'
C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include:  *I am so happy* or set allow_custom_value=True.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: Dr Wanda or set allow_custom_value=True.
  warnings.warn(
C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: QA or set allow_custom_value=True.
  warnings.warn(
2023-11-10 23:29:29 ERROR:Could not find the character "Dr Wanda" inside characters/. No character has been loaded.
Traceback (most recent call last):
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\blocks.py", line 1550, in process_api
    result = await self.call_function(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\gradio\utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
  File "C:\web\text-generation-webui\modules\chat.py", line 561, in load_character
    raise ValueError
ValueError
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 672, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 406, in forward
    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\functional.py", line 1858, in softmax
    ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.93 GiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 1.75 seconds (0.00 tokens/s, 0 tokens, context 2009, seed 789703909)
Output generated in 6.21 seconds (5.47 tokens/s, 34 tokens, context 395, seed 631154972)
Average perplexity: 341.7047
Output generated in 3.56 seconds (5.62 tokens/s, 20 tokens, context 445, seed 427322575)
Average perplexity: 7074.2677
Output generated in 3.50 seconds (5.99 tokens/s, 21 tokens, context 484, seed 771192712)
Average perplexity: 9165.1697
Output generated in 3.84 seconds (5.99 tokens/s, 23 tokens, context 519, seed 1748670314)
Average perplexity: 212004.7341
Output generated in 1.77 seconds (2.26 tokens/s, 4 tokens, context 555, seed 1944057807)
Average perplexity: 16.8753
Output generated in 5.82 seconds (6.71 tokens/s, 39 tokens, context 578, seed 1310045468)
Average perplexity: 69287.8133
Output generated in 5.45 seconds (6.42 tokens/s, 35 tokens, context 639, seed 24082548)
Average perplexity: 2668.7489
Output generated in 5.65 seconds (6.37 tokens/s, 36 tokens, context 689, seed 2008447567)
Average perplexity: 8195.2908
Output generated in 5.11 seconds (6.07 tokens/s, 31 tokens, context 739, seed 1913098486)
Average perplexity: 1657.7791
Output generated in 6.76 seconds (6.66 tokens/s, 45 tokens, context 791, seed 1634271033)
Average perplexity: 2852.6099
Output generated in 5.52 seconds (5.98 tokens/s, 33 tokens, context 860, seed 1186808611)
Average perplexity: 11396.6981
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 9.74 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 10.13 seconds (4.34 tokens/s, 44 tokens, context 935, seed 2012836910)
Average perplexity: 15356.2058
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.29 GiB is allocated by PyTorch, and 929.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 2.59 seconds (0.39 tokens/s, 1 tokens, context 1009, seed 2129792095)
C:\web\text-generation-webui\installer_files\env\lib\site-packages\numpy\core\fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
C:\web\text-generation-webui\installer_files\env\lib\site-packages\numpy\core\_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Average perplexity: nan
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.38 GiB is allocated by PyTorch, and 834.35 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 2.20 seconds (0.45 tokens/s, 1 tokens, context 1025, seed 1640949644)
Average perplexity: nan
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.43 GiB is allocated by PyTorch, and 833.19 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 2.21 seconds (0.45 tokens/s, 1 tokens, context 1044, seed 979927275)
Average perplexity: nan
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.48 GiB is allocated by PyTorch, and 792.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 2.22 seconds (0.45 tokens/s, 1 tokens, context 1059, seed 1526304355)
Average perplexity: nan
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.47 GiB is allocated by PyTorch, and 791.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 2.30 seconds (0.44 tokens/s, 1 tokens, context 1057, seed 2017802898)
Traceback (most recent call last):
  File "C:\web\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\web\text-generation-webui\modules\text_generation.py", line 352, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1789, in generate
    return self.beam_sample(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 3417, in beam_sample
    outputs = self(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 686, in forward
    hidden_states = self.mlp(hidden_states)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 258, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\accelerate\hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 248, in forward
    out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 579, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\web\text-generation-webui\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 516, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.47 GiB is allocated by PyTorch, and 759.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

Nivdea RTX80Ti  12GB
and locally CPU 32 GB
berkut1 commented 11 months ago

For context you need a lot of VRAM too. P.S I usually keep for context ~3-4GB of VRAM (if it less then 8k)

PGTBoos commented 11 months ago

Agree my system 32gb ram (and pagefile) it might get slow but shouldn't crash. The videocard has 12gb ram it should be enough. The error also suggest there are options for pytorch to handle memory better. Though my knowledge about programming these kind of LLM's using pytorch is limited. It might be there is something new with recent versions it doesn't make use off. As these crashes keep a model working for a while but eventually crash

phr00t commented 11 months ago

I'm getting this same error. I only have 8GB VRAM which I presume isn't enough to load much of the model, but I was hoping it would at least run slowly via system RAM (32GB) and CPU assist...

berkut1 commented 11 months ago

@phr00t if you want it slow, just install the latest nvidia driver and don't use this recommendation https://github.com/oobabooga/text-generation-webui/discussions/4484 , or if you do, revert it to defaults.

But keep in mind, you will lose so much performance that there is simply no point in using the GPU and it is better to switch to just the CPU.

github-actions[bot] commented 9 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.