Debugging extension for VSCode

rbavery commented 1 year ago

🚀 The feature

Be able to set breakpoints and step through custom handlers after sending requests to Torchserve. something like VSCode Live Server or the Javascript Debugger for Torchserve that works with VSCode Remote Containers.

Motivation, pitch

Debugging containers with prints is tedious: https://github.com/pytorch/serve/issues/711

Alternatives

debugging with prints: https://github.com/pytorch/serve/issues/711

Additional context

No response

msaroufim commented 1 year ago

That's an awesome idea TBH, i don't think anyone on the team has bandwidth for it but if you're interested in picking this up I'd be happy to actively advise

EDIT: Actually I like this idea so much lemme hack together something, will timebox though and give up if it takes too long

msaroufim commented 1 year ago

Ok didn't manage to finish this today but here's at a high level how this might work, not sure I'll spend too much more time on this but it's cool enough that would be happy to give lots of feedback on a PR. FWIW I am convinced this should work.

The idea is you

Package up your mar file locally
Create a docker container with all the dependencies you need
Set breakpoints in your handler
Send curl requests to torchserve and get the breakpoints triggered
Once the workflow is figured out we can publish a dev image for contributors that are iterating on their handlers that would do this by default

And here's step by step how one might set this up

mkdir project
Setup a dockerfile with torchserve

Dockerfile

FROM pytorch/torchserve:latest
USER root

ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64
RUN pip install ptvsd

WORKDIR /workspace
COPY . /workspace

touch .devcontainer/.devcontainer.json

{
"name": "Torchserve Debugging",
"dockerFile": "../Dockerfile",
"settings": {
  "terminal.integrated.shell.linux": "/bin/bash"
},
"extensions": ["ms-python.python"],
"forwardPorts": [8080, 8081, 6789],
"runArgs": ["--user", "root"]

}

In your handler.py, add at the very top (although debugpy might be a better option)

import ptvsd

ptvsd.enable_attach(address=('0.0.0.0'con89), redirect_output=True)
ptvsd.wait_for_attach()

Install the VS code Dev container extension
Install the VS code python extension
Setup a launch.json

{
  "version": "0.2.0",
  "configurations": [

      {
          "name": "Python: Attach",
          "type": "python",
          "request": "attach",
          "connect": {
              "host": "localhost",
              "port": 6789
          },
          "pathMappings": [
              {
                  "localRoot": "${workspaceFolder}",
                  "remoteRoot": "/workspace"
              }
          ]
      }
  ]
}

Launch a new Dev container
From within Dev container torchserve --start --model-store model_store --models dummy=dummy.mar --ts-config config.properties
Click on debug and attach python to trigger the launch.json
In a separate terminal send a request like curl -X POST http://localhost:8080/predictions/dummy -T input_data.txt

rbavery commented 1 year ago

@msaroufim thanks so much for outlining this! I'm going to test it out. tbh I'm not sure if I have the skillset to make a VSCode extension but it sounds like a good learning experience and might tackle it at a later date.

rbavery commented 1 year ago

I got to the point of showing output in the debug conosle!

XXXXX  Initialization time:  0.15012693405151367
trying to open image
XXXXX  Preprocess time:  0.01322793960571289

However I'm not sure how to activate the breakpoints in the handler file on the container. I looked in /tmp and found the handler file, but breakpoints are greyed out and not getting activated.

Selection_015

msaroufim commented 1 year ago

On phone but i remember this helping

Also in general might be something up with your launch.json and should be triggering debug while the handler file is open

rbavery commented 1 year ago

I triggered the debug with the handler file open and justMyCode set to false but still get the same result with greyed out breakpoints and no stopping at breakpoints.

msaroufim commented 1 year ago

Can you try instead adding a call to breakpoint() anywhere. That'll help us narrow down if this is a VS code config issue or a handler issue. Also just double checking if you have the Python extension installed

rbavery commented 1 year ago

I have the extensions installed

when I set a breakpoint like so in the tmp/handler.py and then start torchserve and attach a debugger it doesn't get called.

    def inference(self, model_input):
        """
        Internal inference methods
        :param model_input: transformed model input data
        :return: list of inference output in NDArray
        """
        breakpoint()
        start = time()
        # Do some inference call to engine here and return output
        model_output = self.ort_session.run(
                        None,
                        {"images": model_input.numpy().astype(np.float32)},
                    )
        print("XXXXX  Inference time: ", time()-start)
        print(len(model_output))
        print(type(model_output))
        return torch.Tensor(model_output)

agunapal commented 1 year ago

Hi @rbavery Would love your feedback regarding https://github.com/pytorch/serve/pull/2605 . With this approach, you can use a debugger

rbavery commented 1 year ago

@agunapal thanks for working on that! The MockContext looks useful and I think I would use this for future projects.

I still think it'd be useful to test handlers as they are without needing a separate script. I have a variety of existing torchserve containers with custom handlers, and I'd prefer to test them as is since my preprocessing functions are already set up in the custom handlers.

I'm also currently running into issues in the torchserve environment that I'm not getting in my local environment, so this kind of solution wouldn't help me step through what is going on in the torchserve container in the debugger. I can't figure out why Tensor RT doesn't get enabled even though it is installed and the container is run with access to GPUs.

agunapal commented 1 year ago

@rbavery I understand this is not the ideal solution and it doesn't address every scenario. But it can be useful in some cases. I have used this with custom handlers. What base image are you using in the docker container? I haven't tried TensorRT in a docker container yet. Will let you know when I work on it. Also, FYI.. if you are using multiple GPUs you will run into this issue till their next release. https://github.com/pytorch/TensorRT/pull/2325

msaroufim commented 1 year ago

I think I figured out what was missing, what we can do is the ts/model_server.py

add this gated behind some flag so people can do something like torchserve --start --debug

cmd.append("--python")
cmd.append("debugpy")
cmd.append("--listen")
cmd.append("0.0.0.0:5678")
cmd.append("--wait-for-client")
cmd.append("--run")
cmd.append(sys.executable)

Then in a handler have a command like

debugpy.listen(('0.0.0.0', 6789))
debugpy.wait_for_client()

@rbavery are you interested in figuring this out again?

pytorch / serve