webis-de / small-text

Active Learning for Text Classification in Python
https://small-text.readthedocs.io/
MIT License
547 stars 60 forks source link

LightweightCoreset should be batched #23

Closed chschroeder closed 1 year ago

chschroeder commented 1 year ago

Feature description

The lightweight_coreset function should compute the distances in batches similar to greedy_coreset. Therefore a batch_size kwarg needs to be added and integrated into the function in the same manner. This keyword must also be added to LightweightCoreset (query strategy) and passed in the function call (similar to GreedyCoreset).

Motivation

This will reduce max memory used and, moreover, will align the lightweight and greedy coreset implementations.

Addition comments

Everything that needs to be adapted is currently located under small_text.query_strategies.coresets.

chschroeder commented 1 year ago

@RaghavPrabhakar66 PR is merged. Thanks for your effort!

I have added you to the list of contributors and updated the changelog.