unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.41k stars 1.29k forks source link

pip install --upgrade --no-cache-dir unsloth BROKE CUDA packages. Inference slower. #1187

Open pusapatiakhilraju opened 4 weeks ago

pusapatiakhilraju commented 4 weeks ago

I had previously installed unsloth on an environment using pip install unsloth. It was working fine with inference for the below code at around 1min 10seconds. I then learnt about the new unsloth_trainer and did pip install --upgrade unsloth. Now running the below code gives following warning and the time for inference on the same data takes 2mins from previously 1min.

Screenshot 2024-10-25 at 6 40 49 PM

I'm pretty sure even model loading also is taking a lot more time. How can I fix it?

max_seq_length = 2048
dtype = None
load_in_4bit = True

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n

### Instruction:\n
{}\n\n

### Input:\n
{}\n
{}\n\n

### Response:\n
{}"""
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "unsloth/Meta-Llama-3.1-8B",
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model)

 # create inputs for batch inference. put them in the prompt template and make them into a list
instructs = []
for index, row in df.iterrows():
    tokenized_text = alpaca_prompt.format(
        instruction,
        row['text'],
        "<|eot_id|>",
        "",
    )
    instructs.append(tokenized_text)

# batch inference
tokenizer.pad_token = "<|end_of_text|>"
tokenizer.padding_side = "left"

inputs = tokenizer(instructs, return_tensors = "pt", padding = True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 1024, do_sample = False, use_cache = True)
danielhanchen commented 4 weeks ago

Apologies on the issue - it's possible torch got randomnly updated - instead of updating all of Unsloth's dependencies, another way is to do pip install --no-deps --upgrade --no-cache-dir unsloth

pusapatiakhilraju commented 4 weeks ago

Thank you for a quick response :)

I've done pip install --no-cache-dir unsloth in a new environment that results in -

Screenshot 2024-10-25 at 7 56 22 PM

But - still has the below issue - and has become slow.

Screenshot 2024-10-25 at 7 56 47 PM

I assume with upgrade what has happened is that, unsloth is expecting CUDA 12 libraries (libnvrtc.so.12) but I have CUDA 8.

pusapatiakhilraju commented 4 weeks ago

Cool the fix for that is - I had to manually downgrade the torch version to work with CUDA8 and the same for xformers.

Thanks @danielhanchen