triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.31k stars 1.48k forks source link

Install Python Backend via pip locally #5813

Open MikhailKravets opened 1 year ago

MikhailKravets commented 1 year ago

Is your feature request related to a problem? Please describe.

I don't see any possibility to install python_backend locally (with pip for example). It makes hard to debug the code. Each time a python backend is added / edited you need to run triton server to test the result.

Describe the solution you'd like

I understand that in order to have the backend's functionality fully-working you may need to run it in a fully-fledged triton server. But what can you say about creation of simplified typing library that can be installed locally? This way we get a possibility to run the code locally.

Let me explain it with an example. At first we install our package via package manager

pip install triton_utils

This package contains types and containers that helps us write actual backend classes. The updated example can look like:

import triton_utils

class TritonPythonModel:
    """Your Python model must use the same class name. Every Python model
    that is created must have "TritonPythonModel" as the class name.
    """

    @staticmethod
    def auto_complete_config(auto_complete_model_config: triton_utils.ModelConfig):
        return auto_complete_model_config

    def execute(self, requests: list[triton_utils.InferenceRequest]):
        responses: list[triton_utils.InferenceResponse] = []
        for request in requests:
            # Now the IDE we use can highlight objects in each request
            # also, we can run this method passing requests container by
            # ourselves

        return responses

Now this code can be run locally (either in a plain script or in unit-tests):

import triton_utils

req = triton_utils.InferenceRequest(
    name="...",
    data_type="...",
    data=...
)

model = TritonPythonModel()
resp = model.execute([req])

I guess that this is achievable because, as I see ModelConfig, InferenceRequest, InferenceResponse are just containers for the data which user can fill by himself. Perhaps, they can be based on python's dataclasses and typing packages. Also, there may be added utils functions that can help user to fill these objects request / config / response objects.

The difficulty I see is the Tensor object. Maybe it's possible to replace it seamlessly by numpy.ndarray?

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context

For a reference, I can share a link on a typing library for python's boto3 mypy_boto3_builder. It is related to the issue I've described.

dyastremsky commented 1 year ago

Thank you for this suggestion. @Tabrizian, what do you think?

Jack47 commented 1 year ago

we really needs this to allow fast develop and quick feedback through running unit tests locally. Docker image for Triton Inference Server is very large and not friendly for the above tests.

what's more, I can't find the pb_utils.InferenceResponse definition in triton_python_backend_utils.py, alternative way to solve this issue is to copy that file into project dir to avoid install python backend.

dyastremsky commented 1 year ago

Thank you for the additional context. We have filed a feature request to look into adding this enhancement.

Tabrizian commented 1 year ago

@Jack47 I think this is a reasonable feature request. Is the main goal of this library to only help with debugging/development of Python backend models? I think once the development is finished you still need to deploy it in the Triton Server but I can see how having a Python package can help to make it easier to develop new models.

MikhailKravets commented 1 year ago

@Tabrizian, ideally, it would be nice to have one library that automatically understands where it's run (or installed). So, InferenceRequest, InferenceResponse, and other objects from the lib will have one interface, but internal implementation will differ depending on the environment (local or Triton). This way, we can write code once and simply run it with Triton.

Jack47 commented 1 year ago

@Tabrizian yes, it would be great help~. would you guys update a link which can show the feature request progress?

phofmann81 commented 7 months ago

+1, is there any progress or workaround documented for running unit tests on model code? As we do a lot of feature retrieval and transformation in Triton Server we absolutely need those.

ClaytonJY commented 7 months ago

@phofmann81 I haven't tried it myself, but I think it should be possible to run tests inside a tritonserver container. You may need to install your own dependencies, but any python code running inside the container should be able to import triton_python_backend_utils, whether it's in a model repository or not.

Not a great solution by any means, but might help?

jadhosn commented 4 months ago

@Tabrizian adding my vote to this ticket. Debugging Python models is challenging, even after importing the triton_python_backend_utils. For example, the new TritonErrors are not defined within the python module, and same for the Logger (e.g. pb_utils.Logger)