huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 222 forks source link

Different unique pair number for SetFitTrainer.train and Trainer.hyperparameter_search with same args #545

Open HexadimensionalerAlp opened 3 months ago

HexadimensionalerAlp commented 3 months ago

Hi, I trained a model with SetFitTrainer and afterwards started a hyperparameter optimization with the same parameters for testing reasons. The expected behaviour would be both of them having the same number of unique pairs and therefore taking roughly the same time. But in reality the direkt train approach had 64240 unique pairs, 4015 optimization steps and took 30 minutes per epoch, while the optimization had 2039350 unique pairs, 127460 optimization steps and was about too take 19 hours.

Training task:

model = SetFitModel.from_pretrained(
    'sentence-transformers/paraphrase-mpnet-base-v2',
    multi_target_strategy='multi-output'
)

trainer = SetFitTrainer(
    model=model,
    train_dataset=datasets['train'],
    eval_dataset=datasets['validation'],
    loss_class=CosineSimilarityLoss,
    batch_size=16,
    num_iterations=20,
    num_epochs=1
)

trainer.train()

Optimization task:

def model_init(params: Dict[str, Any]) -> SetFitModel:
    params = params or {}
    max_iter = params.get('max_iter', 100)
    solver = params.get('solver', 'liblinear')
    params = {
        'head_params': {
            'max_iter': max_iter,
            'solver': solver
        }
    }

    return SetFitModel.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2', multi_target_strategy='multi-output')

def hp_space(trial: Trial) -> Dict[str, Union[float, int, str]]:
    return {
        "body_learning_rate": trial.suggest_float("body_learning_rate", 1e-5, 1e-5, log=True),
        "num_epochs": trial.suggest_int("num_epochs", 1, 1),
        "batch_size": trial.suggest_categorical("batch_size", [16]),
        "seed": trial.suggest_int("seed", 42, 42),
        "max_iter": trial.suggest_int("max_iter", 20, 20),
        "solver": trial.suggest_categorical("solver", ["liblinear"]),
    }

trainer = Trainer(
    train_dataset=datasets['train'],
    eval_dataset=datasets['validation'],
    model_init=model_init
)

best_run = trainer.hyperparameter_search(direction="maximize", hp_space=hp_space, n_trials=1)

The data consists of datasets with the columns 'text' and 'label', where 'text' is a string and 'label' a tensor of following format: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]. Although that should not be relevant for this issue.

In my understanding, both of them should be comparable in complexity of the training task as the used parameters are the same. What ist the explanation for this behaviour and is there a possibility to recreate the situation in the training task in the optimization task?

Thank you in advance!