kubeflow / community

Information about the Kubeflow community including proposals and governance information.
Apache License 2.0
154 stars 211 forks source link

TrustyAI project and Kubeflow #733

Open ruivieira opened 3 weeks ago

ruivieira commented 3 weeks ago

(This issue aims at capturing the discussion following the community presentation and the kubeflow-discuss mailing list post)

On behalf of the TrustyAI team, I would like to thank you all for the opportunity to present the TrustyAI project and discuss the fit with Kubeflow at the community meeting.

TrustyAI summary

TrustyAI is an open-source community dedicated to providing a diverse toolkit for responsible AI development and deployment. TrustyAI was founded in 2019 as part of Kogito, an open-source business automation community, as a response to growing demand from users in highly regulated industries such as financial services and healthcare.

The TrustyAI community maintains a number of projects within the responsible AI field, mostly revolving around model explainability, model monitoring, and responsible model serving.

TrustyAI provides tools to apply explainability, inspect bias/fairness, monitor data drift and mitigate harmful content for a number of different user profiles. For Java developers, we provide the TrustyAI Java library containing TrustyAI’s core algorithms. For data scientists and developers that are using Python, we expose our Java library via the TrustyAI Python library, which fuses the advantages of Java’s speed to the familiarity of Python. Here, TrustyAI’s algorithms are integrated with common data science libraries like Numpy and Pandas. Future work is planned to add native Python algorithms to the library, such as to broaden TrustyAI’s compatibility by integrating with libraries like Pytorch and Tensorflow. One such nascent project is trustyai-detoxify, a module within the TrustyAI Python library that provides guardrails, toxic language detection, and rephrasing capabilities for use with LLMs.

For enterprise and MLOps use-cases, TrustyAI provides the TrustyAI Kubernetes Service and Operator which serves TrustyAI bias, explainability, and drift algorithms within Kubernetes. Both the service and operator are integrated into Open Data Hub (ODH) to facilitate coordination between model servers and TrustyAI, bringing easy access to our responsible AI toolkit to users of both platforms. Currently, the TrustyAI Kubernetes service supports tabular models served in KServe or ModelMesh.

Potential integrations with Kubeflow

Presentation

Any feedback would be greatly appreciated.

All the best, TrustyAI team.

rareddy commented 3 weeks ago

your feedback is appreciated @jbottum @james-jwu @zijianjoy @thesuperzapper @terrytangyuan @johnugeorge @kimwnasptd @andreyvelich @akgraner @StefanoFioravanzo @rimolive

StefanoFioravanzo commented 2 weeks ago

@ruivieira This is a very interesting proposal! It seems like you are proposing to integrate the TrustyAI ecosystem across various popular Kubeflow components. I wonder how we could highlight the added value to the user experience and describe what is the success criteria of this initiative. Few questions:

  1. You mentioned your enterprise MLOps use case. To what extent the proposed integrations rely on purely OSS-based libraries and components? Would there be dependencies on paid services?
  2. Who can contribute these integrations? Do you or your team propose to contribute and implement them?
  3. How do you think we can describe a clear end-to-end user journey that showcases TrustyAI integrations across the whole spectrum of Kubeflow components? Maybe using a real world scenario of model development + production inferencing and monitoring + feedback loop.

Again, thanks for the proposal. Looking forward to more!

ruivieira commented 2 weeks ago

@StefanoFioravanzo thank you! Regarding your questions:

You mentioned your enterprise MLOps use case. To what extent the proposed integrations rely on purely OSS-based libraries and components? Would there be dependencies on paid services?

All of TrustyAI's code (core algorithms, services and integrations) is fully open-source, released under Apache 2.0 and this is a core requirement for contributors. There will be no dependencies on paid services.

Who can contribute these integrations? Do you or your team propose to contribute and implement them?

The TrustyAI team is already implementing some of the integrations and will continue, but any contribution from the wider community would be more than welcome.

How do you think we can describe a clear end-to-end user journey that showcases TrustyAI integrations across the whole spectrum of Kubeflow components? Maybe using a real world scenario of model development + production inferencing and monitoring + feedback loop.

Jupyter notebooks / workbenches

TrustyAI is available as a Python library, with pre-built workbench container images. As such, TrustyAI can be used in the exploration/training phase using tools like bias/fairness metrics and explainers to diagnose and debug potential problems before a model is deployed. In a real-world case, data scientists would use TrustyAI to measure any potential bias, validate assumptions on feature saliencies or score toxicity for a test corpus. Several examples of these techniques are available in our Jupyter notebooks example repo: trustyai-explainability-python-examples.

Pipelines

Due to its simple architecture (integrations built on top of a core of algorithms) TrustyAI can be containerised into single purpose pipeline steps which could be part of a model building or deployment. As a real-world example, a global explainability step could score feature importances and provide a check on whether regulatory compliance about protected attributes are being met.

Deployment/monitoring

When a model is deployed, the TrustyAI service provides real-time metrics such as data drift and bias/fairness. A model's bias or potential data drift is published into Prometheus, from which alerting can be set up if a certain value is outside the acceptable thresholds. As an example, this could in turn trigger automated retraining and a validation using the same methods, but from a pipeline. Several examples of these real-time monitoring methods are available in: odh-trustyai-demos