iterative / dvclive

📈 Log and track ML metrics, parameters, models with Git and/or DVC
https://dvc.org/doc/dvclive
Apache License 2.0
161 stars 35 forks source link

Add user friendly warning/error messages and helpers for log_plot() #750

Open mnrozhkov opened 9 months ago

mnrozhkov commented 9 months ago

When people start using Live.log_plot(), they could struggle with getting an expected visualization because of 2 reasons

  1. log_plot() is very opinionated about the data format required for every template
  2. there are not user-friendly data checks and warning messages

Here are some ideas to help with DVCLive onboarding:

1. "Relax" requirements for data formats supported

For example, the bar_horizontal template expects smth like this:

datapoints = [
    {"name": "petal_width", "importance": 0.4},
    {"name": "petal_length", "importance": 0.33},
    {"name": "sepal_width", "importance": 0.24},
    {"name": "sepal_length", "importance": 0.03}
]

It would be cool to support other formats like: 1) Pandas DataFrame

image

2) Dict with automatically extracts keys as y' and values asx.`

{'petal_width': 0.4,
 'petal_length': 0.33,
 'sepal_width': 0.24,
 'sepal_length': 0.03}

2. Provide minimal sanity checks for data/configs provides For example, if I run this code snippet:

 from dvclive import Live

datapoints = [
    {"name": "petal_width", "importance": 0.4},
    {"name": "petal_length", "importance": 0.33},
    {"name": "sepal_width", "importance": 0.24},
    {"name": "sepal_length", "importance": 0.03}
]

with Live() as live:
    live.log_plot(
        "iris_feature_importance",
        datapoints,
        x="name",
        y="importance",
        template="bar_horizontal",
        title="Iris Dataset: Feature Importance",
        y_label="Feature Name",
        x_label="Feature Importance"
    )

I'll not get any error, but there is nothing showing in VSCode after that:

image

Reason? There is a mistake in x and y arguments assignment, the correct is y="name", x="importance". But, it's very easy to oversee this typo and spend a lot of time trying to figure it out.

How can we help?

Data provided for x has str type bit numerical data type is expected

dberenbaum commented 9 months ago

Another thought on a lightweight way to help here: better docs in https://dvc.org/doc/dvclive/live/log_plot. Having an example of the input format for each template could go a long way. There are already examples of different templates in https://dvc.org/doc/command-reference/plots/show that we could use as a starting point.

dberenbaum commented 9 months ago

Background on the current implementation: https://github.com/iterative/dvclive/pull/543#pullrequestreview-1402602708

dberenbaum commented 9 months ago

Marking as p2 since I don't think log_plot() is frequently used, but still would be really nice to have these improvements