unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.78k stars 1.07k forks source link

Random Training #617

Open x6p2n9q8a4 opened 3 months ago

x6p2n9q8a4 commented 3 months ago

Hi authors,

In the SFTTrainer, we set "seed = 3407". But I find the training procedure is still random. the performance of test dataset and the change of loss are different under same configs.

Thanks,

danielhanchen commented 3 months ago

Oh did you set random_state = 3407 in FastLanguageModel.get_peft_model?

x6p2n9q8a4 commented 3 months ago

Oh did you set random_state = 3407 in FastLanguageModel.get_peft_model?

Yes! I set the seed many times:

1st: from transformers import set_seed as transformers_set_seed transformers_set_seed(3407)

2nd: model = FastLanguageModel.get_peft_model( model, r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 16, lora_dropout = 0, # Supports any, but = 0 is optimized bias = "none", # Supports any, but = "none" is optimized use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context random_state = 3407, use_rslora = False, # We support rank stabilized LoRA loftq_config = None, # And LoftQ )

3rd: trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, eval_dataset = val_dataset, dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 2, packing = False, # Can make training 5x faster for short sequences. args = SFTConfig( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, max_steps = 1500, learning_rate = 2e-4, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407,...)

danielhanchen commented 3 months ago

Oh apologies - do you know if it works fine now? If it's pure random training thats not good

EirikVinje commented 2 months ago

Hi, is this fixed? I have the same issue. I set random seed in FastLanguageModel.get_peft_model and in SFTTrainer trainingarguments.

danielhanchen commented 1 month ago

@EirikVinje Apologies on the delay - it should be fine hopefully - how different are the training runs? If it's on different GPUs / setups, then yes you will get different results

EirikVinje commented 1 month ago

Skjermbilde 2024-07-26 130725 image

@danielhanchen the training runs on the same GPU (RTX 2080 Ti). These results is generated by the shellscript.

TobiasBrambo commented 1 month ago

@danielhanchen Having the same issue as Eirik, using same hyperparameters but getting different results.

Skageb commented 1 month ago

@danielhanchen I am using unsloth for a research project and have the same issue of differing results across identical runs with same seeding and on the same GPU. The inability to reproduce results is a large downside to all the great features of unsloth. I hope you manage to fix this as soon as possible as I am otherwise very happy with the features of the library!

danielhanchen commented 1 month ago

@EirikVinje @TobiasBrambo @Skageb Apologies on the delay and the issue - hmm I just can't seem to repro it - are you saying the finetuning results are different or the generations are different? Generations need temperature = 0 to retain the same outputs.

@EirikVinje Ok that is a bit interesting - I'm just confused since every test I've done shows it's reproducible (I run Colab like every day to check the losses, and they match), so I'm stumped :(

danielhanchen commented 1 month ago

I'll reopen this so I can investigate this more

EirikVinje commented 1 month ago

@danielhanchen some models you cannot set temperature = 0.0, e.g "Qwen/Qwen2-0.5B-Instruct". Initially this was how the model was evaluated.

outputs = self.model.generate(**inputs, max_new_tokens=100, use_cache=True)

But I also tried with this:

outputs = self.model.generate(**inputs, max_new_tokens=100, use_cache=True, temperature=0.0)

May it be a problem running in .py instead of .ipynb ?

danielhanchen commented 1 month ago

Oh set do_sample = False