harmonydata / harmony

The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.
https://harmonydata.ac.uk
MIT License
8 stars 18 forks source link

Add batching in `matcher.py` #63

Open woodthom2 opened 2 weeks ago

woodthom2 commented 2 weeks ago

Related to issue https://github.com/harmonydata/harmony/issues/56

There is another entry point where the LLM is used which is in matcher.py -> https://github.com/harmonydata/harmony/blob/main/src/harmony/matching/matcher.py#L131

Please can you allow the batch size to be set in an environment variable BATCH_SIZE. If BATCH_SIZE is not set (either empty or null) then we default to 50.

Please add unit tests. E.g. set batch size to 5 and send 10 items to Harmony, check that they are divided into 2 batches of 5+5? If you can add any new unit tests to the folder that would be great