pip install --upgrade --no-cache-dir unsloth BROKE CUDA packages. Inference slower.

pusapatiakhilraju commented 4 weeks ago

I had previously installed unsloth on an environment using pip install unsloth. It was working fine with inference for the below code at around 1min 10seconds. I then learnt about the new unsloth_trainer and did pip install --upgrade unsloth. Now running the below code gives following warning and the time for inference on the same data takes 2mins from previously 1min.

I'm pretty sure even model loading also is taking a lot more time. How can I fix it?

max_seq_length = 2048
dtype = None
load_in_4bit = True

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n

### Instruction:\n
{}\n\n

### Input:\n
{}\n
{}\n\n

### Response:\n
{}"""
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "unsloth/Meta-Llama-3.1-8B",
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model)

 # create inputs for batch inference. put them in the prompt template and make them into a list
instructs = []
for index, row in df.iterrows():
    tokenized_text = alpaca_prompt.format(
        instruction,
        row['text'],
        "<|eot_id|>",
        "",
    )
    instructs.append(tokenized_text)

# batch inference
tokenizer.pad_token = "<|end_of_text|>"
tokenizer.padding_side = "left"

inputs = tokenizer(instructs, return_tensors = "pt", padding = True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 1024, do_sample = False, use_cache = True)

danielhanchen commented 4 weeks ago

Apologies on the issue - it's possible torch got randomnly updated - instead of updating all of Unsloth's dependencies, another way is to do pip install --no-deps --upgrade --no-cache-dir unsloth

pusapatiakhilraju commented 4 weeks ago

Thank you for a quick response :)

I've done pip install --no-cache-dir unsloth in a new environment that results in -

But - still has the below issue - and has become slow.

I assume with upgrade what has happened is that, unsloth is expecting CUDA 12 libraries (libnvrtc.so.12) but I have CUDA 8.

pusapatiakhilraju commented 4 weeks ago

Cool the fix for that is - I had to manually downgrade the torch version to work with CUDA8 and the same for xformers.

Thanks @danielhanchen

unslothai / unsloth

pip install --upgrade --no-cache-dir unsloth BROKE CUDA packages. Inference slower. #1187