When using a model endpoint for precheck and the sdg-svc for generate, we should be able to have multiple concurrent requests to these endpoints to help scale this out. Some thoughts on how to do this ...
First we should do some manual / quick script testing of concurrent requests to these endpoints to see if we get any errors when doing 3, 5, 10 (or whatever) at a time. Let's find some number to start with that seems reliable for now.
We can't just add multiple goroutines to the same worker. Each worker routine assumes it owns some resources on disk (the taxonomy git clone in particular). There's some choices for this:
cleanest -- when running in this mode, we don't need local GPUs. We can move to a pool of cheaper VMs, each running its own worker instance. We can scale that pool of nodes according to the concurrency desired.
we can run multiple worker instances on each node, but each needs their own working directory with its own taxonomy repo to work with.
Allow multiple go routines in one worker, but like above, each needs its own working directory with a taxonomy repo that it owns.
So, we need some testing, some design decisions, some deployment automation
When using a model endpoint for
precheck
and thesdg-svc
forgenerate
, we should be able to have multiple concurrent requests to these endpoints to help scale this out. Some thoughts on how to do this ...So, we need some testing, some design decisions, some deployment automation