[Feature Request]: Make Metrics Available on Runs/Experiments View

strangiato commented 1 year ago

Feature description

When testing different models, hyper parameters, or any other changes in a data pipeline, it is often necessary to compare the results across multiple runs. In the upstream Kubeflow Pipelines UI, they make the metrics available on the Experiments view to allow you to easily see the results of all of the runs:

Describe alternatives you've considered

No response

Anything else?

No response

andrewballantyne commented 1 year ago

cc @VaishnaviHire

rimolive commented 1 year ago

@strangiato Can you provide a screenshot with better resolution? This one is too small and it's loosing resolution when zomming in.

strangiato commented 1 year ago

experiments

Hopefully this is better!

strangiato commented 1 year ago

Also, in case it is helpful, here is a very simple pipeline for generating metrics: https://github.com/rh-intelligent-application-practice/kubeflow-examples/blob/main/pipelines/5_metrics_pipeline.py

accorvin commented 1 year ago

@strangiato this is definitely something we plan to include, but we're still very much in the design phase. Are there any specific metrics you think are most important to include?

strangiato commented 1 year ago

@accorvin, I would expect each pipeline to have their own requirements, which would be defined in the pipeline itself. Metrics will be very dependent on what kind of model you are building, the use case/requirements, what the model is being optimized for, the optimization algorithm the data scientists chooses, etc.

Common metrics I would expect to see might include:

accuracy score
mse
f1
precision
recall

Those are all metrics that are relevant to classification problems so when you get into different data science problems such as NLP, there are whole new sets of metrics that become relevant. You could also have a totally custom metric defined for the specific use case as well.

The feature request here is to simply display the metrics that are defined in the pipeline (by the pipeline developer/data scientist) in a way that you can compare the results of multiple runs without having to open each run one by one.

accorvin commented 1 year ago

Thanks @strangiato! I've been thinking through what a user-friendly yet sufficiently generic mechanism for implementing this could be. I think the best idea I have is to implement a standard for storing a json (or similar file) with a well defined structure in some well defined location for each pipeline run. It would be something like key/value pairs or metric name/value or something like that, that the dashboard could then display. Thoughts?

strangiato commented 1 year ago

That is exactly what Kubeflow does today:

https://v1-6-branch.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/

Although, while trying to dig for that documentation I learned that apparently runMetrics is going to be depreciated in the v2 api:

https://github.com/kubeflow/website/pull/3462

I'm still unsure of what they are planning on replacing it with though or if a replacement tooling already exists.

I would expect that these metrics should be tightly integrated with the model registry and the model registry provides the capabilities for logging the values.

Personally I like the way that MLFlow handles logging with a simple function:

mlflow.log_metric('my-metric', 9000)

https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metric

Having to construct a json object and figure out how to get that object stored in the correct location so Kubeflow could "auto-magically" display it as a metric was not the best experience.

The same is true for the visualization capabilities in Kubeflow. There is a special structure for the json object that you have to know and save to the correct location to generate the visualizations. The user experience for figuring out how to build those correctly is pretty terrible and not documented very well. I think that there are some changes happening here as well with the v2 API and I think they are providing some functions to be able to generate visualizations a little bit easier in the future.

accorvin commented 1 year ago

mlflow.log_metric('my-metric', 9000) <-- This seems like a really elegant way to provide this functionality.

@rimolive, @HumairAK, @gmfrasca, @gregsheremeta do any of you know if the KFP community has any plans for providing this sort of functionality?

rimolive commented 1 year ago

kfpv2 is generalizing components through containerization, meaning that any component can be created if there is a container image for it. Looking some component examples they provide in their repo, they have a confusion matrix component, which basically ships a Python code to return that metric. The code writes a JSON in a volume so it can be rendered.

That being said, I believe they will work on other metric components to add as containers. On our end, we need to see that JSON structure to understand how to render in our UI.

dgutride commented 5 months ago

Moving to closed - there is an epic tracking experiments requirements and implementation here: https://issues.redhat.com/browse/RHOAIENG-2944

opendatahub-io / odh-dashboard