huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.62k stars 27.15k forks source link

hyperparameter_serch() does not consider LoRA parameters like r to be finetuned. #29391

Open rajumishra8107 opened 9 months ago

rajumishra8107 commented 9 months ago

Feature request

Hyperparameter search is a must known activity to get hyperparameters which are for optimised machine learning or deep learning model output. I was trying hyperparameter_search() method to get optimal hyperparameters values. I wanted to tune LoRA confugarition parameters like rank r and alpha too. But i found that it is not able to finetune LoRA configuration paramaters.

Motivation

Following code part will give you the idea

def model_init():
    device_map = {"": torch.cuda.current_device(
                 )} if torch.cuda.is_available() else None
​
    model_kwargs_dict = dict(
    # set this to True if your GPU supports it 
    #(Flash Attention drastically speeds up model computations)
    #attn_implementation="flash_attention_2", 
    torch_dtype="auto",
    # set to False as we're going to use gradient checkpointing
    use_cache=False, 
    device_map=device_map,
    )
    device_map = {"": torch.cuda.current_device(
             )} if torch.cuda.is_available() else None
​
    bnb_config_args = dict(load_in_4bit = True,
                   bnb_4bit_quant_type = "nf4",
                   bnb_4bit_compute_dtype = torch.bfloat16,
                   bnb_4bit_use_double_quant = False)
    bnb_config = BitsAndBytesConfig(
                                **bnb_config_args
                     )
    model_kwargs_dict["quantization_config"] = bnb_config
    model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m", 
                                                return_dict=True, 
                                                #**model_kwargs_dict
                                               )
    print(peft_config)
    model = get_peft_model(model, peft_config = peft_config)
    return model

dataset = load_dataset("imdb", split="train")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")

dataset1 = dataset.select([0, 10, 20, 30, 40, 50])
dataset2 = dataset.select([0, 10, 20, 30, 40, 50])

trainer = SFTTrainer(
    model=None,
    args=training_args,
    model_init=model_init,
    tokenizer=tokenizer,
    train_dataset=dataset1,
    eval_dataset= dataset2,
    dataset_text_field="text",
    max_seq_length=512,)

def optuna_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
        "r": trial.suggest_float("r", 2, 4, log=True),
    }

trainer.hyperparameter_search(direction=["minimize"],
                            backend="optuna",
                            hp_space=optuna_hp_space,
                            n_trials=2)

The code above has resulted in output like

[I 2024-02-29 09:42:36,869] A new study created in memory with name: no-name-331fbdff-6465-42f8-9c97-ad5c6c8c4703 Trying to set r in the hyperparameter search but there is no corresponding field in TrainingArguments.

Your contribution

NA

ArthurZucker commented 8 months ago

FYI @younesbelkada