Closed woodthom2 closed 1 week ago
@woodthom2 Hello. If this issue is still open, i would love to work on that and contribute to your project.
Hi @makrianast , please feel free to take this on! Thanks so much! Do you want to have a quick chat with me on Discord/Google Meet about it?
Just FYI the server that is running the Harmony web tool is 16 GB. I have not tested to find out at what size a request crashes the server but I am pretty certain that the critical number is between 50 and 2000 questionnaire items! Of course we have to allow for different user machine specs
Hello @woodthom2 . Yes of course. My discord is: anastasiamakrii . Feel free to contact me there if you'd like!
Thanks!
On Fri, 1 Nov 2024, 17:49 makrianast, @.***> wrote:
Hello @woodthom2 https://github.com/woodthom2 . Yes of course. My discord is: anastasiamakrii . Feel free to contact me there if you'd like!
— Reply to this email directly, view it on GitHub https://github.com/harmonydata/harmony/issues/56#issuecomment-2452316158, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUBTVJO3H3YVFQCQY3UZXDZ6O5JPAVCNFSM6AAAAABQQRBEBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJSGMYTMMJVHA . You are receiving this because you were mentioned.Message ID: @.***>
Also related to https://github.com/harmonydata/harmony/issues/63
Description
Can we modify
convert_texts_to_vector
inhttps://github.com/harmonydata/harmony/blob/main/src/harmony/matching/default_matcher.py
to allow items to be batched when sent to the LLM?Batch size should be variable
Rationale
If a user wants to harmonise 10,000 items, this will not fit in memory even in a high performance machine. Small laptops probably can only batch 20 items at a time. But the batching should be configurable as it will slow things down. Perhaps as a parameter.
People have reported that the website cannot cope with large harmonisations. E.g. below comment on Discord (23 Oct 2024)