Open bertsky opened 3 years ago
Hi, i think you should solve the issue at the load-balancer/proxy level as when the request reaches the uWSGI socket, well it is too late :)
I think that a least connection load balancer (like the one exposed by nginx) should do the trick. You can map different uWSGI workers to different sockets (and allow only one worker to manage the GPU) or spawn completely different uWSGI instances on different sockets (with one of them binding the GPU)
I have to run a service which is computationally heavy and benefits from GPUs when available. But GPU resources (esp. its RAM) cannot be as easily shared as others – there's no paging usually, and often processes try to allocate all of the available memory (this is Tensorflow's default). Even if workers behave cooperatively, there are usually only so many parallel tasks you can stuff on a single GPU, but often you have excess capacity for CPU-only computation. It so happens for me that waiting for a free GPU worker takes longer than running a CPU worker instead (while GPU workers are still faster than CPU workers of course). Therefore I am in a situation where I need to prioritize one kind of worker over the other – I don't want the server to randomly pick a free worker, but always pick a GPU worker if one is available. (So that's different than cheaping workers, esp. since initializing new workers takes too much time.)
Having sifted through uwsgi's documentation a lot, and trying to get something workable with signals and switching
uwsgi.accepting
(which did not work at all), I wonder what's the correct way to do this kind of thing. Is there something like nginx'worker_priority
in uwsgi?