Open vadimkantorov opened 1 month ago
Here's a demonstration of adding decryption of the ONNX model weights at loading time:
But maybe the better way would be to implement this as allowing the user to specify a path to a custom .so
-file in the triton model config or alternatively implement this via calling in the backend code stub I/O hooks which could then be overridden by the user with LD_PRELOAD'ed custom impl of these hooks. Then these hooks could implement loading model weights from some S3 / custom FS path or do custom decryption or something else.
Of course this approach can become more complicated if the model weights are accessed via mmap
-ing of the weights.
Sometimes it's useful to allow the user to allow decryption of the model/weights prior to loading or allow a custom user hook for this end. This is useful for basic foolproof protection of models in some on-premises setups.
ORT supports something like this in:
Could this also be supported in ORT backend for Triton?