unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.34k stars 798 forks source link

Google Colab breaks #243

Open medhasreenivasan opened 3 months ago

medhasreenivasan commented 3 months ago

I am getting the below error while trying to import the FastLangugeModel from unsloth while using an A100 GPU on colab.

Failed to import transformers.integrations.peft because of the following error (look up to see its traceback): cannot import name 'set_guard_fail_hook' from 'torch._dynamo.eval_frame'

Is there a solution available for this issue? Thank you!

tolas92 commented 3 months ago

Yes , i am also facing the same issue from today.

danielhanchen commented 3 months ago

@tolas92 @medhasreenivasan Hey! Sorry on the delay! I can reproduce this error - Colab seems to have updated some of their packages, causing Colab to break - working on a fix now! Thanks again and apologies!

danielhanchen commented 3 months ago

A temporary fix is instead of doing unsloth[colab] @ git+https://github.com/unslothai/unsloth.git, do the following:

!pip install "https://download.pytorch.org/whl/cu121/xformers-0.0.24-cp310-cp310-manylinux2014_x86_64.whl" --no-deps
!pip install --upgrade transformers datasets sentencepiece tyro
!pip install --upgrade bitsandbytes accelerate trl peft --no-deps
!pip install git+https://github.com/unslothai/unsloth.git

And for Flash Attn, add the final

!pip install flash-attn einops ninja --no-deps

as well.

I'm working on a more elegant fix

480 commented 3 months ago

I am also facing the same issue. but @danielhanchen 's temporary fix is useful.

My modifications

%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    # !pip install "unsloth[colab-ampere] @ git+https://github.com/unslothai/unsloth.git"
    !pip install "https://download.pytorch.org/whl/cu121/xformers-0.0.24-cp310-cp310-manylinux2014_x86_64.whl" --no-deps
    !pip install --upgrade transformers datasets sentencepiece tyro
    !pip install --upgrade bitsandbytes accelerate trl peft --no-deps
    !pip install git+https://github.com/unslothai/unsloth.git
    !pip install flash-attn einops ninja --no-deps

Thank you.

danielhanchen commented 3 months ago

Ok looks like I fixed it! The new section at the top will be:

%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

So still ugly, but I'll handle these issues later - I updated all notebooks to use this new approach.

tolas92 commented 3 months ago

thanks @danielhanchen for the quick fix!!.

480 commented 3 months ago

Ok looks like I fixed it! The new section at the top will be:

%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

So still ugly, but I'll handle these issues later - I updated all notebooks to use this new approach.

It works perfectly. Thanks @danielhanchen

koleshjr commented 3 months ago

Hey @danielhanchen I am facing this issue during inference: NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 2327, 8, 4, 128) (torch.float16) key : shape=(1, 2327, 8, 4, 128) (torch.float16) value : shape=(1, 2327, 8, 4, 128) (torch.float16) attn_bias : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> p : 0.0 flshattF@0.0.0 is not supported because: xFormers wasn't build with CUDA support requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) operator wasn't built - see python -m xformers.info for more info tritonflashattF is not supported because: xFormers wasn't build with CUDA support requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) operator wasn't built - see python -m xformers.info for more info operator does not support BMGHK format triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4 cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support dtype=torch.float16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> operator wasn't built - see python -m xformers.info for more info operator does not support BMGHK format unsupported embed per head: 128

danielhanchen commented 3 months ago

@koleshjr Oh no how are you doing inference? On Colab? Did you manage to use the new install instructions in our Colab notebooks?

koleshjr commented 3 months ago

Yes I did , it's failing for free tier t4 when you call model.generate but for v100 it's passing.

koleshjr commented 3 months ago

@danielhanchen These are the new imports that you have suggested in this thread

import torch major_version, minor_version = torch.cuda.get_device_capability()

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" if major_version >= 8:

!pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes

else:

!pip install --no-deps xformers trl peft accelerate bitsandbytes

pass

This is how i am calling the model for inference

outputs = model.generate(**inputs,max_new_tokens = 1012, use_cache = True) result = tokenizer.batch_decode(outputs) result

Only failing for Google collab t4

danielhanchen commented 3 months ago

@koleshjr Would you happen to have a screenshot of the erro?

koleshjr commented 3 months ago

@danielhanchen Apparently for some reason it now fixed. Sorry For this. I appreciate your feedback though. Thanks

danielhanchen commented 3 months ago

No problems at all!

reneric commented 3 months ago
Screenshot 2024-03-19 at 10 15 12 PM

I am seeing this after using your update as well, but for training. Hoping it fixes itself like it did for @danielhanchen but thought I'd share a screenshot.

reneric commented 3 months ago

Ah restarting the session did not work but killing the runtime and starting fresh did

danielhanchen commented 3 months ago

@reneric Oh great you solved the issue!