Currently we send a request per unique word to the embedding server to get that words embedding vector.
The server supports sending multiple words at a time and getting back the results. We should chunk up the requests to make fewer API calls which should make the embedding fetching quicker.
This is the function that will need to be modified to run the queries in batches and then correctly assign the result once the batch has been effected.
2) It would be also good to give some feedback on this process that can show in the classification interface to let a user know how much of the embedding has been loaded.
Currently we send a request per unique word to the embedding server to get that words embedding vector.
The server supports sending multiple words at a time and getting back the results. We should chunk up the requests to make fewer API calls which should make the embedding fetching quicker.
https://github.com/tsdataclinic/smooshr/blob/8b11ccba820434de75a62da5e00e0e336ef3414e/src/utils/calc_embedings.js#L1-L20
This is the function that will need to be modified to run the queries in batches and then correctly assign the result once the batch has been effected.
Things to consider :
1) The server might fail if one or more of the words does not have a representation in the corpus. We would need to fix that here : https://github.com/tsdataclinic/smooshr/blob/8b11ccba820434de75a62da5e00e0e336ef3414e/server/server.py#L66-L80
2) It would be also good to give some feedback on this process that can show in the classification interface to let a user know how much of the embedding has been loaded.