huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.25k stars 223 forks source link

[FR] add Dropout to Dense SetFitHead #151

Open PhilipMay opened 2 years ago

PhilipMay commented 2 years ago

Hi, maybe a dropout layer in the Dense SetFitHead would be nice to have. Implementation could be like in the Transformers BertForSequenceClassification head:

https://github.com/huggingface/transformers/blob/d447c460b16626c656e4d7a9425f648fe69517b3/src/transformers/models/bert/modeling_bert.py#L1506-L1517

What do you think? @blakechi @lewtun

blakechi commented 2 years ago

Hi @PhilipMay, Sorry for the late reply. That's an interesting idea!

I tried adding a dropout layer, and unfortunately, I saw the performance dropped significantly with 0.05 dropout rate (I only tried emotion dataset and the performance dropped from 48.XX to 2X.XX ~ 3X.XX). I also tried adding a layer norm as well, but the results were the same.

So I think maybe these two are not suitable for few-shot learning. By the way, here is the command I used for testing:

python scripts/setfit/run_fewshot.py --classifier pytorch --keep_body_frozen --lr 0.01 --is_test_set true
tomaarsen commented 1 year ago

I'm personally in favor of refactoring to an implementation where it's relatively simple for users to provide their own heads, after which these kinds of head features (dropout, pooling, multiple layers, etc.) can all be implemented to the likings of the user.