sabetAI / BLoRA

batched loras
324 stars 15 forks source link

`OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB` during output phase with winddude/wizardLM-LlaMA-LoRA-7B: #4

Open fritol opened 10 months ago

fritol commented 10 months ago

i ran it on free colab. used the code as is. got this at the last cell after it got to winddude/wizardLM-LlaMA-LoRA-7B:

winddude/wizardLM-LlaMA-LoRA-7B:
Develop an eight sentence short story about a character who can bring their dreams into reality, but only for a limited time.

10. 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
[<ipython-input-6-a0c28315b38f>](https://localhost:8080/#) in <cell line: 3>()
      1 outputs = []
      2 
----> 3 for out in model.generate(
      4     **batch,
      5     max_length=200,

10 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py](https://localhost:8080/#) in forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache)
    325             # reuse k, v, self_attention
    326             key_states = torch.cat([past_key_value[0], key_states], dim=2)
--> 327             value_states = torch.cat([past_key_value[1], value_states], dim=2)
    328 
    329         past_key_value = (key_states, value_states) if use_cache else None

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.50 GiB already allocated; 16.81 MiB free; 13.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
fritol commented 10 months ago

well it's simply to much for the 24GB VRAM in free colab

so i removed the two extra instances

but only LORA with any decent output is jondurbin/airoboros-7b-gpt4-1.2-peft any other outputs nonsense such as 10. 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 for a 8 sentence story ;D