twelvelabs-io / tl-jockey

Jockey is a conversational video agent.
51 stars 13 forks source link

Parallelize workers #44

Open TravisCouture opened 4 months ago

TravisCouture commented 4 months ago

Workers are limited to one API call at a time. This is increases latency. We can implement await async on the majority of worker API calls and also allow tools for the workers to accept a list of requests to execute. This list can then be executed in parallel within the worker and the responses aggregated into a single output to return to the top level LLM. This is especially impactful on latency when the video-text-generation worker needs to call generate for multiple/many videos.