[Feature Request]: Batched Async Prediction

WindChimeRan commented 4 months ago

Hi,

The current scikit-llm is implemented in a synchronous way - the prompts are sent to the api one-by-one.

This is not ideal when we have a large dataset and a high tier (high TPM/RPM) account. Is it possible to incorporate batched async feature?

Reference:

oaib

OKUA1 commented 4 months ago

Hi @WindChimeRan,

Unfortunately, the OpenAI api does not support batched requests, so there is going to be 1 request per 1 sample anyway.

The only possibility of speedup is to send multiple requests in parallel. Adding an async api could be nice (e.g. a_predict method), but this is not really compliant with scikit-learn API and might be confusing for some users. The more straightforward way would be to just support synchronous parallel processing and allow to specify n_jobs hyperparameter. This is something we had in mind since day 1, but never prioritised as until relatively recently the rate limits would not allow for sufficient parallelisation anyway.

As for now, you can simply split your dataset and run predict on each chunk in a thread pool.

AndreasKarasenko commented 3 weeks ago

@WindChimeRan see #101. Edit: I did some experimenting with the FewShotClassifier and I quickly run into rate limits. Tbf it's for sentiment classification and some of the reviews sampled are VERY long (which i do not monitor), so potentially there is no real speed-up.

iryna-kondr / scikit-llm

[Feature Request]: Batched Async Prediction #84