huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 223 forks source link

Unable to reproduce hyper param results #482

Closed dex426 closed 9 months ago

dex426 commented 9 months ago

Hello i am attempting to hyper parameter tune a set fit model for a simple classification problem. However i cannot seem to reproduce the same results i am getting during my hyper parameter tuning.

I'm unsure if i'm misunderstanding something but wish to optimise for the f1 metric and use the "best" model as my fine tuned version.

model_id = "sentence-transformers/paraphrase-mpnet-base-v2"

def make_model():
    # set seed for reproducible output 
    set_seed(420)
    return SetFitModel.from_pretrained(model_id)

trainer = SetFitTrainer(
    model_init=make_model,
    train_dataset=X_train_dataset,
    eval_dataset=X_validation_dataset,
    loss_class=CosineSimilarityLoss,
    column_mapping={"combined_text" : "text", "primary_topic_int" : "label"},
    seed=42,
    metric="f1",
    metric_kwargs={"average": "weighted"}
)

def hyperparameter_search_function(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
        "batch_size": trial.suggest_categorical("batch_size", [4, 8, 16]),
        "num_epochs": trial.suggest_int("num_epochs", 1, 1),
        #"seed": trial.suggest_int("seed", 1, 40),
        "num_iterations": trial.suggest_categorical("num_iterations", [5, 10,12,15, 20, 25]),
        "max_iter": trial.suggest_int("max_iter", 50, 300),
    }

best = trainer.hyperparameter_search(hyperparameter_search_function, n_trials=2)
best

Once all trials have been complete i get the following message:

"BestRun(run_id='0', objective=0.5698441204897445, hyperparameters={'learning_rate': 7.333152383799594e-06, 'batch_size': 16, 'num_epochs': 1, 'num_iterations': 10, 'max_iter': 267}"

When trying to replicate this outside of the hyper parameter search function by loading in the best hyper parameters i get a different result with my f1 score now being lower at 0.55866 compared to 0.56984. Am i correct in thinking that the hyper param tuner is using my eval metrics when it says objective?

trainer.apply_hyperparameters(best.hyperparameters, final_model=True) trainer.train() metrics = trainer.evaluate() metrics

tomaarsen commented 9 months ago

Hello!

Out of curiosity, are you getting the different results when you train a fresh model with those hyperparameters, or when you use the

trainer.apply_hyperparameters(best.hyperparameters, final_model=True)
trainer.train()
metrics = trainer.evaluate()
metrics

that you included? I assume the latter, based on your message, but I'm just making sure - it's a bit surprising that the final model results differ from your hyperparameter search ones.

And yes, the objective indeed refers to the evaluation results, which should indeed be the 56.9 F1 that you received.

Edit: This is a bit of a weird question, but do you also experience this behaviour if you use set_seed(12)? I think I might have found a bug.

dex426 commented 9 months ago

Hi Tom,

Thanks for getting back to me so fast! Yeah i get that behaviour with both methods and can't seem to see where im going wrong.

i'll give it a try with the set seed at 12 to see if that changes anything.

Edit: I've updated the set_seed to 12 but i'm still getting different results.

Thanks

Dex

dex426 commented 9 months ago

Whoops! Just realised i was running an outdated version of Setfit (0.0.7). After upgrading i am no longer having these issues. Apologies

tomaarsen commented 9 months ago

Glad to hear it! And no worries 😄 I'm glad you figured it out - I hadn't yet found the time to chase this down any further.