triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
134 stars 57 forks source link

Built-in support for (custom?) decryption of model weights #279

Open vadimkantorov opened 1 month ago

vadimkantorov commented 1 month ago

Sometimes it's useful to allow the user to allow decryption of the model/weights prior to loading or allow a custom user hook for this end. This is useful for basic foolproof protection of models in some on-premises setups.

ORT supports something like this in:

Could this also be supported in ORT backend for Triton?

vadimkantorov commented 1 week ago

Here's a demonstration of adding decryption of the ONNX model weights at loading time:

But maybe the better way would be to implement this as allowing the user to specify a path to a custom .so-file in the triton model config or alternatively implement this via calling in the backend code stub I/O hooks which could then be overridden by the user with LD_PRELOAD'ed custom impl of these hooks. Then these hooks could implement loading model weights from some S3 / custom FS path or do custom decryption or something else.

Of course this approach can become more complicated if the model weights are accessed via mmap-ing of the weights.