triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.72k stars 1.42k forks source link

Authorisation for endpoints #5558

Open okyspace opened 1 year ago

okyspace commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] For security reasons, there might be a need to secure the endpoints, allowing only authorised accounts to be able to hit the endpoints (REST, GRPC, METRICS).

Describe the solution you'd like A clear and concise description of what you want to happen. A means to whitelist accounts to access different endpoints.

dyastremsky commented 1 year ago

Thank you for requesting this feature. We have filed a ticket to investigate this enhancement.

rmccorm4 commented 1 year ago

CC @GuanLuo @whoisj

okyspace commented 5 months ago

Hi, can i check if this will be implemented? We may be purchasing Nvidia AI Enterprise licence, will this feature be available?

rmccorm4 commented 5 months ago

Hi @okyspace, we have the restricted endpoint feature for both HTTP and GRPC endpoints: https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta.

You should be able to setup key/value pairs to authorize specific routes/features. Does this satisfy your needs?

okyspace commented 5 months ago

Hi @rmccorm4, thanks for your reply. I understand that we could restrict some apis to certain group based on key-value made available to them. From the list of protocols/features, it seems like I can only limit inference api but not down to certain exposed model(s). The use case would be to restrict certain models to a group of users only. Would it be possible?

rmccorm4 commented 5 months ago

Hi @okyspace, I don't believe it is currently possible to restrict specific models. The workaround would likely be to start separate tritonservers for each logical set of models you would like, and then to restrict each accordingly.

CC @nnshah1 @GuanLuo

nnshah1 commented 5 months ago

The typical expectation is that Triton Inference Server is deployed within a larger solution with security and authorization handled outside of Triton itself (https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/deploy.md)

Restricting access to an instance or group of instances with access to specific models overall would be a better solution as it also avoids any possibility that one user could effect another via denial of service (that is a user could fully load a model they have access to and effectively prevent access to another model that they don't have access to).

expanding restricted groups to models is possible but we'd want to balance that against the use cases / value in deployment scenarios.

okyspace commented 4 months ago

@nnshah1, thanks.

I guessed it is a matter of how much segregation (i.e. physically vs logically vs single instance with ACL) we want in implementation for different groups of users. when we consider resource utilisation, security, etc.

Different instances for different groups does help to limit access but this may lead to lower utilisation of resources, as not all groups may have many models to utilise its own triton instance. It may cause make management of triton instances more difficult too (compared to SA manaaging just single Triton pod and scale it accordingly).

Currently, the [limit-endpoint-access-beta] can limit the higher level APIs (e.g. health, metadata or inference). Was thinking if a more granular control could be done at model level. This could help to limit access within a single triton instance, and the model owner could easily share their key-value to authorize other users if need be.

What do you think?

nnshah1 commented 4 months ago

What is the deployment setup / strategy you are aiming for? Are you / have you considered using a reverse proxy as part of your solution:

https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-http-basic-authentication/

These general purpose solutions are built to provide the kind of security and authorization you are describing (including more sophisticated things like JWT).

I think the main concern on our side would be adding more checks into the inference pipeline as we want to avoid any additional latencies as a rule (though in this case the check may be small and be able to be optimized in most scenarios) - and also how likely these features would be used in real world deployments.

nnshah1 commented 4 months ago

@okyspace You mentioned Nvidia AI Enterprise license - is there someone in NVIDIA you are already working with that we can work with to understand your use case better?