Parallelize workers - Githubissues

Workers are limited to one API call at a time. This is increases latency. We can implement await async on the majority of worker API calls and also allow tools for the workers to accept a list of requests to execute. This list can then be executed in parallel within the worker and the responses aggregated into a single output to return to the top level LLM. This is especially impactful on latency when the video-text-generation worker needs to call generate for multiple/many videos.

twelvelabs-io / tl-jockey

Parallelize workers #44