huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 222 forks source link

TrainingArgs not recognised in Trainer #559

Open PrithivirajDamodaran opened 2 months ago

PrithivirajDamodaran commented 2 months ago

v1.1.0

Below warnings are throws thrown by the snippet.

2024-09-22 19:09:50,723 - No TrainingArguments passed, using output_dir=tmp_trainer. 2024-09-22 19:09:50,735 - No loss passed, using losses.CoSENTLoss as a default option.


from setfit import SetFitModel, Trainer, TrainingArguments

:
:
training_args = TrainingArguments(
            output_dir=output_dir,                
            eval_strategy=save_strategy, 
            save_strategy=save_strategy,          
            batch_size=batch_size, 
            num_epochs=epochs,
            body_learning_rate = lr,         
            warmup_proportion=warmup_proportion,   
            logging_dir=f"{output_dir}/logs",    
            load_best_model_at_end=True,  
            show_progress_bar = True,
            use_amp = use_amp,
            samples_per_label=min_samples,
            loss=CosineSimilarityLoss,
        )

 trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        metric="accuracy",
        column_mapping={"text": "text", "label": "label"},
    )
cjuracek-tess commented 1 month ago

I think you need to be more specific with how you're defining your variables - the following example does not raise the loss warning you are discussing. I commented out the parameters which are variables because we don't know their values:

from datasets import load_dataset
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset
from sentence_transformers.losses.CosineSimilarityLoss import CosineSimilarityLoss

dataset = load_dataset("sst2")
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
test_dataset = dataset["validation"]
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")

training_args = TrainingArguments(
            # output_dir=output_dir,
            # eval_strategy=save_strategy,
            # save_strategy=save_strategy,
            # batch_size=batch_size,
            # num_epochs=epochs,
            # body_learning_rate = lr,
            # warmup_proportion=warmup_proportion,
            # logging_dir=f"{output_dir}/logs",
            # load_best_model_at_end=True,
            show_progress_bar=True,
            # use_amp = use_amp,
            # samples_per_label=min_samples,
            loss=CosineSimilarityLoss,
        )

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        metric="accuracy",
        column_mapping={"sentence": "text", "label": "label"}
)

Output:

FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
Applying column mapping to the training dataset
Applying column mapping to the evaluation dataset
Map: 100%|██████████| 16/16 [00:00<00:00, 5759.43 examples/s]

Process finished with exit code 0

So the problem is likely related to:

seanfarr788 commented 1 day ago

I think it is just an erroneous warning message, printing out trainer.args shows the args being set correctly