Open okyspace opened 1 year ago
Thank you for requesting this feature. We have filed a ticket to investigate this enhancement.
CC @GuanLuo @whoisj
Hi, can i check if this will be implemented? We may be purchasing Nvidia AI Enterprise licence, will this feature be available?
Hi @okyspace, we have the restricted endpoint feature for both HTTP and GRPC endpoints: https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta.
You should be able to setup key/value pairs to authorize specific routes/features. Does this satisfy your needs?
Hi @rmccorm4, thanks for your reply. I understand that we could restrict some apis to certain group based on key-value made available to them. From the list of protocols/features, it seems like I can only limit inference api but not down to certain exposed model(s). The use case would be to restrict certain models to a group of users only. Would it be possible?
Hi @okyspace, I don't believe it is currently possible to restrict specific models. The workaround would likely be to start separate tritonservers for each logical set of models you would like, and then to restrict each accordingly.
CC @nnshah1 @GuanLuo
The typical expectation is that Triton Inference Server is deployed within a larger solution with security and authorization handled outside of Triton itself (https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/deploy.md)
Restricting access to an instance or group of instances with access to specific models overall would be a better solution as it also avoids any possibility that one user could effect another via denial of service (that is a user could fully load a model they have access to and effectively prevent access to another model that they don't have access to).
expanding restricted groups to models is possible but we'd want to balance that against the use cases / value in deployment scenarios.
@nnshah1, thanks.
I guessed it is a matter of how much segregation (i.e. physically vs logically vs single instance with ACL) we want in implementation for different groups of users. when we consider resource utilisation, security, etc.
Different instances for different groups does help to limit access but this may lead to lower utilisation of resources, as not all groups may have many models to utilise its own triton instance. It may cause make management of triton instances more difficult too (compared to SA manaaging just single Triton pod and scale it accordingly).
Currently, the [limit-endpoint-access-beta] can limit the higher level APIs (e.g. health, metadata or inference). Was thinking if a more granular control could be done at model level. This could help to limit access within a single triton instance, and the model owner could easily share their key-value to authorize other users if need be.
What do you think?
What is the deployment setup / strategy you are aiming for? Are you / have you considered using a reverse proxy as part of your solution:
https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-http-basic-authentication/
These general purpose solutions are built to provide the kind of security and authorization you are describing (including more sophisticated things like JWT).
I think the main concern on our side would be adding more checks into the inference pipeline as we want to avoid any additional latencies as a rule (though in this case the check may be small and be able to be optimized in most scenarios) - and also how likely these features would be used in real world deployments.
@okyspace You mentioned Nvidia AI Enterprise license - is there someone in NVIDIA you are already working with that we can work with to understand your use case better?
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] For security reasons, there might be a need to secure the endpoints, allowing only authorised accounts to be able to hit the endpoints (REST, GRPC, METRICS).
Describe the solution you'd like A clear and concise description of what you want to happen. A means to whitelist accounts to access different endpoints.