iterative / vscode-dvc

Machine learning experiment tracking and data versioning with DVC extension for VS Code
https://marketplace.visualstudio.com/items?itemName=Iterative.dvc
Apache License 2.0
188 stars 28 forks source link

Plots: Vertical Zoom or Log-Scaling #5503

Closed RaW-Git closed 4 months ago

RaW-Git commented 4 months ago

Hey,

in my current project, I have a very large loss spike at the beginning of the training, which levels off after 1-2 epochs. I know it's possible to zoom in via the mouse wheel, but this always zooms into X and Y at the same time.

Here's a picture of the loss:

Screenshot 2024-05-04 at 00 10 18

It's very hard to navigate that loss function with the current zooming behaviour. I only want to zoom in on the Y axis, to ignore that spike at the beginning. That would be very helpful. Otherwise a Log-scaled Y axis would probably also do the job, or even setting limits for axis.

Thanks for the VS Code Extension btw. I really like this integrated setup.

dberenbaum commented 4 months ago

Hi @RaW-Git!

You can use a custom vega-lite template to get a log-scale plot. This one should work:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": "<DVC_METRIC_DATA>"
  },
  "title": {
    "text": "<DVC_METRIC_TITLE>",
    "anchor": "middle"
  },
  "width": "<DVC_METRIC_PLOT_WIDTH>",
  "height": "<DVC_METRIC_PLOT_HEIGHT>",
  "params": [
    {
      "name": "smooth",
      "value": 0.001,
      "bind": {
        "input": "range",
        "min": 0.001,
        "max": 1,
        "step": 0.001
      }
    }
  ],
  "encoding": {
    "x": {
      "field": "<DVC_METRIC_X>",
      "type": "quantitative",
      "title": "<DVC_METRIC_X_LABEL>"
    },
    "color": "<DVC_METRIC_COLOR>",
    "strokeDash": "<DVC_METRIC_STROKE_DASH>"
  },
  "layer": [
    {
      "layer": [
        {
          "params": [
            "<DVC_METRIC_ZOOM_AND_PAN>"
          ],
          "mark": "line"
        },
        {
          "transform": [
            {
              "filter": {
                "param": "hover",
                "empty": false
              }
            }
          ],
          "mark": "point"
        }
      ],
      "encoding": {
        "y": {
          "field": "<DVC_METRIC_Y>",
          "type": "quantitative",
          "title": "<DVC_METRIC_Y_LABEL>",
          "scale": {"type": "log"}
        },
        "color": {
          "field": "rev",
          "type": "nominal"
        }
      },
      "transform": [
        {
          "loess": "<DVC_METRIC_Y>",
          "on": "<DVC_METRIC_X>",
          "groupby": "<DVC_METRIC_GROUP_BY>",
          "bandwidth": {
            "signal": "smooth"
          }
        }
      ]
    },
    {
      "mark": {
        "type": "line",
        "opacity": 0.2
      },
      "encoding": {
        "x": {
          "field": "<DVC_METRIC_X>",
          "type": "quantitative",
          "title": "<DVC_METRIC_X_LABEL>"
        },
        "y": {
          "field": "<DVC_METRIC_Y>",
          "type": "quantitative",
          "title": "<DVC_METRIC_Y_LABEL>",
          "scale": {"type": "log"}
        },
        "color": {
          "field": "rev",
          "type": "nominal"
        }
      }
    },
    {
      "mark": {
        "type": "circle",
        "size": 10
      },
      "encoding": {
        "x": {
          "aggregate": "max",
          "field": "<DVC_METRIC_X>",
          "type": "quantitative",
          "title": "<DVC_METRIC_X_LABEL>"
        },
        "y": {
          "aggregate": {
            "argmax": "<DVC_METRIC_X>"
          },
          "field": "<DVC_METRIC_Y>",
          "type": "quantitative",
          "title": "<DVC_METRIC_Y_LABEL>",
          "scale": {"type": "log"}
        },
        "color": {
          "field": "rev",
          "type": "nominal"
        }
      }
    },
    {
      "transform": [
        {
          "calculate": "<DVC_METRIC_PIVOT_FIELD>",
          "as": "pivot_field"
        },
        {
          "pivot": "pivot_field",
          "op": "mean",
          "value": "<DVC_METRIC_Y>",
          "groupby": [
            "<DVC_METRIC_X>"
          ]
        }
      ],
      "mark": {
        "type": "rule",
        "tooltip": {
          "content": "data"
        },
        "stroke": "grey"
      },
      "encoding": {
        "opacity": {
          "condition": {
            "value": 0.3,
            "param": "hover",
            "empty": false
          },
          "value": 0
        }
      },
      "params": [
        {
          "name": "hover",
          "select": {
            "type": "point",
            "fields": [
              "<DVC_METRIC_X>"
            ],
            "nearest": true,
            "on": "mouseover",
            "clear": "mouseout"
          }
        }
      ]
    }
  ]
}

Save that and in your dvc.yaml file, under that plot specification, include something like template: path/to/log_template.json.

If you want to look deeper, https://github.com/iterative/dvc-render/pull/136 has some background on log scaling.

RaW-Git commented 4 months ago

Hey thanks. That works fine, but only if I specify the path/to/log_template.json as an absolute path. Any form of relative path is not found. For example, putting the log_template.json into the same directory as the dvc.yaml and adding it as template: log_template.json results in the error: "log_template.json not found".

dberenbaum commented 4 months ago

Hm, that shouldn't be the case. Anything unusual about your git repo? Relative to the root of your git repo, what is the path to:

  1. Your dvc repo?
  2. dvc.yaml?
  3. log_template.json?
RaW-Git commented 4 months ago

The .dvc folder is at the root of the repository. It's a mono-repo for multiple machine learning projects (since they share some parts of the data). The dvc.yaml, which imports the vega-lite template is in the folder projects/xyz/dvc.yaml and the template log_template.json is in the same folder projects/xyz/log_template.json relative to the root of the repo.

dberenbaum commented 4 months ago

Sorry, the paths for templates can be a bit confusing and are not well documented. The best option is to put the template under .dvc/plots/log_template.json, which is always checked. Then you can refer to it as template: log_template in any dvc.yaml file.

RaW-Git commented 4 months ago

That worked, thanks