unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
13.33k stars 881 forks source link

Issue with phi-3 on Long Sequences with Batches > 1 #481

Open Samoed opened 2 months ago

Samoed commented 2 months ago

Hi! I'm encountering an issue while tuning phi-3 on long sequences with batch sizes greater than 1. Below is the code to reproduce the problem:

Working Code:

tokenized = tokenizer(
    ["Very long prompt\n" * 3000],  # *2,
    max_length=3000,
    return_tensors="pt",
    truncation=True,
).to("cuda")

res = model.generate(
    **tokenized,
    max_length=4096,
)

Code with Error:

RuntimeError: The expanded size of the tensor (2047) must match the existing size (3001) at non-singleton dimension 3. Target sizes: [2, 32, 1, 2047]. Tensor sizes: [2, 1, 1, 3001]
tokenized = tokenizer(
    ["Very long prompt\n" * 3000] * 2,
    max_length=3000,
    return_tensors="pt",
    truncation=True,
).to("cuda")

res = model.generate(
    **tokenized,
    max_length=4096,
)

Notebook with example.

Any insights on how to resolve this issue would be greatly appreciated!

danielhanchen commented 2 months ago

Oh interesting I'll check this and get back to you - sorry!

xinxin-chen commented 1 week ago

The problem still exists. It seems there is a 2048 token limit for the Phi-3 mini/medium model, but not for other models in Unsloth

danielhanchen commented 1 week ago

Apologies I'll escalate this to higher priority - will try getting a fix for this