Workers are limited to one API call at a time. This is increases latency. We can implement await async on the majority of worker API calls and also allow tools for the workers to accept a list of requests to execute. This list can then be executed in parallel within the worker and the responses aggregated into a single output to return to the top level LLM. This is especially impactful on latency when the video-text-generation worker needs to call generate for multiple/many videos.
Workers are limited to one API call at a time. This is increases latency. We can implement
await async
on the majority of worker API calls and also allow tools for the workers to accept a list of requests to execute. This list can then be executed in parallel within the worker and the responses aggregated into a single output to return to the top level LLM. This is especially impactful on latency when the video-text-generation worker needs to callgenerate
for multiple/many videos.