HP search on contrastive learning or classification head

Hello!

Apologies for the delayed response. This is currently not possible indeed, and would require some fairly notable changes. With the upcoming v1.0.0 it will become slightly easier, but still requires a bit of hacking. Consider the following snippet as an end-to-end example of how it could work in v1.0.0. The majority of the info is described in the WIP documentation, and the only changes are described with comments in the code.

from setfit import SetFitModel
from setfit.training_args import TrainingArguments

def model_init(params) -> SetFitModel:
    params = params or {}
    max_iter = params.get("max_iter", 100)
    solver = params.get("solver", "liblinear")
    params = {
        "head_params": {
            "max_iter": max_iter,
            "solver": solver,
        }
    }
    return SetFitModel.from_pretrained("BAAI/bge-small-en-v1.5", **params)

from optuna import Trial
from typing import Dict, List, Optional, Union

def hp_space(trial: Trial) -> Dict[str, Union[float, int, str]]:
    return {
        "body_learning_rate": trial.suggest_float("body_learning_rate", 1e-6, 1e-3, log=True),
        "num_epochs": trial.suggest_int("num_epochs", 1, 3),
        "batch_size": trial.suggest_categorical("batch_size", [16, 32, 64]),
        "seed": trial.suggest_int("seed", 1, 40),
        "max_iter": trial.suggest_int("max_iter", 50, 300),
        "solver": trial.suggest_categorical("solver", ["newton-cg", "lbfgs", "liblinear"]),
    }

from datasets import Dataset, load_dataset
from setfit import Trainer, sample_dataset

dataset = load_dataset("SetFit/emotion")
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
test_dataset = dataset["test"]

# Custom for #411: We create a trainer that does not train a classifier head, and we override the evaluation
# that is used to guide the HPO with one that returns the last `eval_embedding_loss`.
class CustomTrainer(Trainer):
    def train_classifier(self, x_train: List[str], y_train: Union[List[int], List[List[int]]], args: Optional[TrainingArguments] = None) -> None:
        pass

    def evaluate(self, dataset: Optional[Dataset] = None) -> Dict[str, float]:
        if dataset:
            raise Exception("Can't evaluate with a dataset with this custom trainer.")
        return [log["eval_embedding_loss"] for log in self.state.log_history if "eval_embedding_loss" in log][-1]

# We set the training arguments to evaluate every certain steps
args = TrainingArguments(
    evaluation_strategy="steps",
    eval_steps=20,
)

trainer = CustomTrainer(
    args=args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset.select(range(100)), # <- Only take a small evaluation set
    model_init=model_init,
)
# Note that you should use "minimize" because now we're tuning on embedding evaluation loss, not on accuracy
best_run = trainer.hyperparameter_search(direction="minimize", hp_space=hp_space, n_trials=10)
print(best_run)

trainer.apply_hyperparameters(best_run.hyperparameters, final_model=True)
trainer.train()

metrics = trainer.evaluate()
print(metrics)

However, the logistic regression head is lightning fast, and it gives a fairly good measure of how well the finetuned sentence transformer model aligns with your task, so I'm wondering why you would like to change up the HPO. Hope this helps a bit.

If you want, you can already try out the v1.0.0 using:

pip install git+https://github.com/huggingface/setfit.git@v1.0.0-pre

And otherwise the release should be out next week!

Tom Aarsen

huggingface / setfit

HP search on contrastive learning or classification head #411