squaredev-io / whitebox

[Not Actively Maintained] Whitebox is an open source E2E ML monitoring platform with edge capabilities that plays nicely with kubernetes
https://squaredev.io/whitebox/
MIT License
184 stars 5 forks source link

Add data monitoring feature #150

Open momegas opened 1 year ago

momegas commented 1 year ago

Is your feature request related to a problem? Please describe. The problem issue is trying to solve is that some users need to check and validate their data as part of their MLOps lifecycle. Since Whitebox already does this for the training and inference dataset, we should be able to extend this functionality to a complete data monitoring solution.

Describe the solution you'd like A possible solution is to create a data monitoring project, just like we do in model monitoring. The user should be able to specify where the data is located (S3, SQL, and other integrations in the future) and whitebox will run the data monitoring pipelines just it does with model monitoring.

A possible flow is the following:

  1. Create a data monitoring project (through SDK/ UI/ API)
  2. Choose the data to be monitored by specifying the data source and credentials.
  3. Run the data monitoring pipelines and display the findings on the dashboard (like model monitoring)
NickNtamp commented 1 year ago

@momegas what do you mean data monitoring pipelines? Is any of the existing pipelines of whitebox a data monitoring one? Do we have to create some data monitoring pipelines? If yes, then a discussion is needed.