webis-de / small-text

Active Learning for Text Classification in Python
https://small-text.readthedocs.io/
MIT License
547 stars 60 forks source link

What are the best query strategies to use as a baseline approach? #20

Closed renebidart closed 1 year ago

renebidart commented 1 year ago

I'm not sure where to start to get a good baseline result with active learning for text classification. What query strategies should be attempted first? Is there something like this survey https://arxiv.org/abs/2203.13450 implemented for text classification?

chschroeder commented 1 year ago

Hi @renebidart, I am not aware of any comprehensive benchmark of this kind. An exhaustive benchmark is unlikely to exist (unless from one of the larger well-known organization) because active learning experiments can very quickly become computationally expensive since there are a lot of combinations for such a large benchmark.

I would advise to try uncertainty-based methods first (such as BreakingTies). They are computationally cheap and usually provide a strong baseline: https://aclanthology.org/2022.findings-acl.172.pdf

renebidart commented 1 year ago

Thanks for the quick reply @chschroeder! And that's a great paper, I'll try out that method.