Open rbavery opened 1 year ago
That's an awesome idea TBH, i don't think anyone on the team has bandwidth for it but if you're interested in picking this up I'd be happy to actively advise
EDIT: Actually I like this idea so much lemme hack together something, will timebox though and give up if it takes too long
Ok didn't manage to finish this today but here's at a high level how this might work, not sure I'll spend too much more time on this but it's cool enough that would be happy to give lots of feedback on a PR. FWIW I am convinced this should work.
The idea is you
And here's step by step how one might set this up
mkdir project
Dockerfile
FROM pytorch/torchserve:latest
USER root
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64
RUN pip install ptvsd
WORKDIR /workspace
COPY . /workspace
touch .devcontainer/.devcontainer.json
{
"name": "Torchserve Debugging",
"dockerFile": "../Dockerfile",
"settings": {
"terminal.integrated.shell.linux": "/bin/bash"
},
"extensions": ["ms-python.python"],
"forwardPorts": [8080, 8081, 6789],
"runArgs": ["--user", "root"]
}
handler.py
, add at the very top (although debugpy might be a better option)import ptvsd
ptvsd.enable_attach(address=('0.0.0.0'con89), redirect_output=True)
ptvsd.wait_for_attach()
launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 6789
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/workspace"
}
]
}
]
}
torchserve --start --model-store model_store --models dummy=dummy.mar --ts-config config.properties
launch.json
curl -X POST http://localhost:8080/predictions/dummy -T input_data.txt
@msaroufim thanks so much for outlining this! I'm going to test it out. tbh I'm not sure if I have the skillset to make a VSCode extension but it sounds like a good learning experience and might tackle it at a later date.
I got to the point of showing output in the debug conosle!
XXXXX Initialization time: 0.15012693405151367
trying to open image
XXXXX Preprocess time: 0.01322793960571289
However I'm not sure how to activate the breakpoints in the handler file on the container. I looked in /tmp and found the handler file, but breakpoints are greyed out and not getting activated.
On phone but i remember this helping
Also in general might be something up with your launch.json and should be triggering debug while the handler file is open
I triggered the debug with the handler file open and justMyCode set to false but still get the same result with greyed out breakpoints and no stopping at breakpoints.
Can you try instead adding a call to breakpoint() anywhere. That'll help us narrow down if this is a VS code config issue or a handler issue. Also just double checking if you have the Python extension installed
I have the extensions installed
when I set a breakpoint like so in the tmp/handler.py and then start torchserve and attach a debugger it doesn't get called.
def inference(self, model_input):
"""
Internal inference methods
:param model_input: transformed model input data
:return: list of inference output in NDArray
"""
breakpoint()
start = time()
# Do some inference call to engine here and return output
model_output = self.ort_session.run(
None,
{"images": model_input.numpy().astype(np.float32)},
)
print("XXXXX Inference time: ", time()-start)
print(len(model_output))
print(type(model_output))
return torch.Tensor(model_output)
Hi @rbavery Would love your feedback regarding https://github.com/pytorch/serve/pull/2605 . With this approach, you can use a debugger
@agunapal thanks for working on that! The MockContext looks useful and I think I would use this for future projects.
I still think it'd be useful to test handlers as they are without needing a separate script. I have a variety of existing torchserve containers with custom handlers, and I'd prefer to test them as is since my preprocessing functions are already set up in the custom handlers.
I'm also currently running into issues in the torchserve environment that I'm not getting in my local environment, so this kind of solution wouldn't help me step through what is going on in the torchserve container in the debugger. I can't figure out why Tensor RT doesn't get enabled even though it is installed and the container is run with access to GPUs.
@rbavery I understand this is not the ideal solution and it doesn't address every scenario. But it can be useful in some cases. I have used this with custom handlers. What base image are you using in the docker container? I haven't tried TensorRT in a docker container yet. Will let you know when I work on it. Also, FYI.. if you are using multiple GPUs you will run into this issue till their next release. https://github.com/pytorch/TensorRT/pull/2325
I think I figured out what was missing, what we can do is the ts/model_server.py
add this gated behind some flag so people can do something like torchserve --start --debug
cmd.append("--python")
cmd.append("debugpy")
cmd.append("--listen")
cmd.append("0.0.0.0:5678")
cmd.append("--wait-for-client")
cmd.append("--run")
cmd.append(sys.executable)
Then in a handler have a command like
debugpy.listen(('0.0.0.0', 6789))
debugpy.wait_for_client()
@rbavery are you interested in figuring this out again?
🚀 The feature
Be able to set breakpoints and step through custom handlers after sending requests to Torchserve. something like VSCode Live Server or the Javascript Debugger for Torchserve that works with VSCode Remote Containers.
Motivation, pitch
Debugging containers with prints is tedious: https://github.com/pytorch/serve/issues/711
Alternatives
debugging with prints: https://github.com/pytorch/serve/issues/711
Additional context
No response