`OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB` during output phase with winddude/wizardLM-LlaMA-LoRA-7B:

i ran it on free colab. used the code as is. got this at the last cell after it got to winddude/wizardLM-LlaMA-LoRA-7B:

winddude/wizardLM-LlaMA-LoRA-7B:
Develop an eight sentence short story about a character who can bring their dreams into reality, but only for a limited time.

10. 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
[<ipython-input-6-a0c28315b38f>](https://localhost:8080/#) in <cell line: 3>()
      1 outputs = []
      2 
----> 3 for out in model.generate(
      4     **batch,
      5     max_length=200,

10 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py](https://localhost:8080/#) in forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache)
    325             # reuse k, v, self_attention
    326             key_states = torch.cat([past_key_value[0], key_states], dim=2)
--> 327             value_states = torch.cat([past_key_value[1], value_states], dim=2)
    328 
    329         past_key_value = (key_states, value_states) if use_cache else None

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.50 GiB already allocated; 16.81 MiB free; 13.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

sabetAI / BLoRA

`OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB` during output phase with winddude/wizardLM-LlaMA-LoRA-7B: #4