Open mossaab0 opened 2 years ago
@mossaab0 What version of TS are you using? Can you try building TS from source and let me know if it still fails. I suspect this is the same issue for which I pushed fix #1552 (will be added to the next release)
@maaquib This is based on torchserve-nightly:gpu-2022.04.13 which already includes the #1552 fix. Before the fix, even 20 QPS was failing.
@mossaab0 If you can provide some reproduction steps, I can try to rootcause this
@maaquib it is a bit difficult to provide more reproduction steps, as that would basically mean sharing the models. But I think here is something you can try (which I haven't tried, though). Figure out the maximum QPS that a GPU node can handle for the cat / dog classifier (for a couple of hours). Then, run a perf test with half of that QPS using the sequential workflow (i.e., including dog breeds model) for a couple of hours. I expect the second perf test to fail.
Hi @mossaab0 we've discussed this internally, we're in the progress of redesigning how workflows work and make it possible to define a DAG within your handler file in python.
It should be possible to take an existing sequential workflow or parallel workflow and refactor it a new nn.Module
or handler.py
please ping me if you need any advice on how to do this
Hi @mossaab0 we've discussed this internally, we're in the progress of redesigning how workflows work and make it possible to define a DAG within your handler file in python.
It should be possible to take an existing sequential workflow or parallel workflow and refactor it a new
nn.Module
orhandler.py
please ping me if you need any advice on how to do this
I'm also running into this. Any pointers to what the refactor would look like?
We have 2 onnx models deployed in a GPU machine built on top of the nightly docker image.
I suspect there is a delay in releasing the resources that becomes an issue only with high QPS (these resources are eventually released later, bring the machine back to life).