Open taneem-ibrahim opened 1 year ago
OOD enabled model will produce in a single output tensor:
We would need output transformation to separate 1 from 2 and logging to record the input/output and OOD scores (e.g., in OpenShift logging and/or Prometheus). These are generic functionalities that should be useful for things beyond OOD.
Hi, @mudhakar @taneem-ibrahim , just to add more details on OOD (model certainty) enablement and deployment.
Per a discussion with @njhill and @ckadner, the best path forward is to have an output transformer (similar to post-processing transformer in k-serve) native to model-mesh, without requiring k-serve controller.
@nirmdesai @mudhakar After further discussion with @njhill and @ckadner , sounds like our fastest way to get integrated would be to add a custom post processor as part of OOD for now until we have kserve-raw or serverless available in ODH.
A proposal for the post-processing transform
Thanks @daw3rd. @njhill , @taneem-ibrahim , @ckadner: The above "KServe Proxy" is the custom post-processor container you proposed last week. Could you please review and confirm this is what you had in mind? cc: @mudhakar
Hi @nirmdesai Is the kserve proxy (rest server) here replicating functionality similar to this?
@taneem-ibrahim: Just to be precise, we are not going to use K-Serve transformer framework (shown in the link you shared) in implementing K-Serve Proxy. However, the implementation of our K-Serve Proxy will look similar to a typical pre-/post processor function shown in the example above. Also, the deployment flow will be different from the link you shared wherein the transformer is deployed along with InferenceService creation. In our case, you would first create an InferenceService as you would normally, and on top of that deploy the proxy container. Then you would use the Proxy APIs for inferences instead of using the InferenceService APIs for inferencing. cc: @mudhakar , @daw3rd , @spacew
Hello @taneem-ibrahim @nirmdesai @mudhakar @spacew @daw3rd cc: @njhill @ckadner
In regards to a proxy service for transforming model output for a certainty-enabled model, below is a diagram demonstrating the interaction for a modelmesh proxy server deployed on the openshift to the same cluster as where RHODS is hosted. Note that in the deployment, we also deploy a Prometheus service for logging the model-certainty metrics overtime as generated by the modelmesh proxy service - both are packaged via helm install, however, if a Prometheus instance already exists, this can be removed.
Please share feedback or comments on the deployment and sequence steps, as well as the endpoint for reaching the modelmesh proxy.
Copying discussions I've had on Slack:
I think TrustyAI can provide a lot of the capabilities that the modelmesh-proxy is aiming to provide, which would provide the advantage of not needing to add another component into the mix
TrustyAI within ODH/RHODS is a service that intercepts modelmesh inputs and output payloads and then sends metrics computed on that input/output data (e.g., fairness metrics) to Prometheus. If we defined a metric that simply grabbed the certainty scores from the model output payload and emitted them to Prometheus as a metric, it'd be a really simple way of doing what you're trying to do.
As a PoC, I've done exactly that and got an OOD model deployed in modelmesh and sending the OOD metrics to Prometheus within OpenDataHub:
If an OOD enabled model is deployed, model mesh metrics should capture the two additional metrics that these models generate as part of the inferencing metrics.