BenGraWarBuf commented 1 year ago

Does anyone have a rule of thumb for what size local GPU can be used to fine-tune with proprietary data? Or, is there another way to speed up the training that I'm not aware of? I'm fine-tuning a multiclass text classifier on NVIDIA GeForce GTX 1660 SUPER 6GB. obviously not a powerful GPU but I'm not against upgrading to a more powerful unit. It just takes a while to fine-tune.

my training params are as follows: Num examples = 15365720 Num epochs = 1 Total optimization steps = 320120 Total train batch size = 48

code: from setfit import SetFitModel, SetFitTrainer from sentence_transformers.losses import CosineSimilarityLoss

Load a SetFit model from Hub

model_id = "sentence-transformers/paraphrase-mpnet-base-v2" model = SetFitModel.from_pretrained(model_id)

Create trainer

trainer = SetFitTrainer( model=model, train_dataset=train_ds, eval_dataset=eval_ds, loss_class=CosineSimilarityLoss, metric="accuracy", batch_size=48, num_iterations=20, # The number of text pairs to generate for contrastive learning num_epochs=1, # The number of epochs to use for constrastive learning column_mapping={"line_text": "text", "label": "label"} )

Train

trainer.train()

kgourgou commented 1 year ago

That's a pretty large number of examples! Can you take a stratified sample of your dataset, say, k=30 examples per class, then try to finetune that one and see how it does? Then you can increase $k$ as needed. If you have a lot of classes, you may want to start from $k=5$ to keep things manageable.

The contrastive approach takes a dataset of size $n$ and creates a new dataset of size $O(n(n-1)/2)$, so if $n$ is large, the contrastive dataset will be huge.

tomaarsen commented 1 year ago

Hello!

@kgourgou is exactly right, and I second his recommendation. You can use this:

from setfit import sample_dataset

# Load your dataset
dataset = load_dataset(...)

# Simulate the few-shot regime by sampling 8 examples per class
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)

This also automatically gives you an even distribution of classes, should you be interested in that.

With sufficient data, finetuning a model using 🤗 Transformers tends to outperform SetFit. See for example this image:

The finetuned model on the full dataset outperformed SetFit here. With other words, you may want to consider that option.

Tom Aarsen

BenGraWarBuf commented 1 year ago

yep you are both absolutely right! thank you for the suggestions

kgourgou commented 1 year ago

Good luck!

huggingface / setfit

local GPU/hardware requirements #373

Load a SetFit model from Hub

Create trainer

Train