Is your feature request related to a problem? Please describe.
The problem issue is trying to solve is that some users need to check and validate their data as part of their MLOps lifecycle. Since Whitebox already does this for the training and inference dataset, we should be able to extend this functionality to a complete data monitoring solution.
Describe the solution you'd like
A possible solution is to create a data monitoring project, just like we do in model monitoring. The user should be able to specify where the data is located (S3, SQL, and other integrations in the future) and whitebox will run the data monitoring pipelines just it does with model monitoring.
A possible flow is the following:
Create a data monitoring project (through SDK/ UI/ API)
Choose the data to be monitored by specifying the data source and credentials.
Run the data monitoring pipelines and display the findings on the dashboard (like model monitoring)
@momegas what do you mean data monitoring pipelines? Is any of the existing pipelines of whitebox a data monitoring one? Do we have to create some data monitoring pipelines? If yes, then a discussion is needed.
Is your feature request related to a problem? Please describe. The problem issue is trying to solve is that some users need to check and validate their data as part of their MLOps lifecycle. Since Whitebox already does this for the training and inference dataset, we should be able to extend this functionality to a complete data monitoring solution.
Describe the solution you'd like A possible solution is to create a data monitoring project, just like we do in model monitoring. The user should be able to specify where the data is located (S3, SQL, and other integrations in the future) and whitebox will run the data monitoring pipelines just it does with model monitoring.
A possible flow is the following: