unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.4k stars 1.29k forks source link

TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'dataset_text_field' #1264

Closed officialsahyaboutorabi closed 1 week ago

officialsahyaboutorabi commented 2 weeks ago

Hello there, when using the Google Colab. I reached this step:

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = 'text',
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

However, I get the following error: TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'dataset_text_field' . Is there any method of fixing this issue?

aniketh-s commented 1 week ago

Faced the same error, and a notebook restart solved it. Also the first time i had upgraded my transformers, but after restart I chose not to upgrade and used the default version that got installed while installing the unsloth library. This solved me the issue.

stepetal commented 1 week ago

Hello, @officialsahyaboutorabi Fix issue by means of importing SFTConfig from trl, replacing TrainingArguments with it and moving all necessary parameters (mentioned there https://github.com/huggingface/trl/blob/main/trl/trainer/sft_config.py) out of SFTTrainer to SFTConfig.

So, now my cell with SFTTrainer looks like:

from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    args = SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps = 5,
        num_train_epochs = 3, # Set this for 1 full training run.
        #max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "model_traning_outputs",
        report_to = "none",
        max_seq_length = 2048,
        dataset_num_proc = 4,
        packing = False, # Can make training 5x faster for short sequences.
    ),
)
officialsahyaboutorabi commented 1 week ago

@stepetal This fixed all of my issues, thank you so much!

officialsahyaboutorabi commented 1 week ago

Faced the same error, and a notebook restart solved it. Also the first time i had upgraded my transformers, but after restart I chose not to upgrade and used the default version that got installed while installing the unsloth library. This solved me the issue.

I still got the same issue when doing that. However I resolved the issue with @stepetal 's solution.

stepetal commented 1 week ago

@stepetal This fixed all of my issues, thank you so much!

@officialsahyaboutorabi That's great! You're welcome)

danielhanchen commented 1 week ago

I'll update Unsloth to add a dataset_text_field option allow for backwards compatibility - sorry on the issue everyone!