iterative / dvc-render

Library for rendering DVC plots
http://docs.iterative.ai/dvc-render/
Apache License 2.0
6 stars 6 forks source link

`plots`: Add support for plotly as backend for render plots #7

Open daavoo opened 2 years ago

daavoo commented 2 years ago

plotly is a set of Open Source Graphing Libraries for building "Interactive charts and maps for Python, R, Julia, ggplot2, .NET, and MATLAB®".

The "high level" concept is very similar to vega-lite (the current DVC plots backend): Both are javascript libraries based on d3.js using JSON to describe the plot "schema" and provide "bindings" to generate plots in different languages (altair would be the vega-lite Python equivalent). See a more detailed comparison

It would be nice to extend DVC plots to support plotly as an alternative backend. The following is a non-exhaustive list of what I consider advantages (in DVC context) of adding support to plotly:

As a non exhaustive example, see differences between python bindings stats plotly / altair

Try plotly line chart / vega-lite line chart

This is especially relevant for some complex plots like iterative/dvc#4455 , where plotly provides many relevant interactions by default (i.e. reordering columns, selecting subsets) that seem quite complicated to add (if even possible) in vega-lite:

plotly parallel coordinates / vega-lite parallel coordinates

After reviewing the internal dvc.render module and discussing it with @pared , it looks that it won't require too many changes on DVC to add support to plotly.

Edit by @dberenbaum to start a tasklist here of possible future plotly enhancements:

- [ ] Better smoothing (https://github.com/iterative/dvc-render/issues/135)
- [ ] Log-linear plots (https://github.com/iterative/dvc-render/pull/136)
- [ ] Zooming and panning (https://github.com/iterative/vscode-dvc/issues/4530)
- [ ] Responsive sizing (https://github.com/iterative/vscode-dvc/issues/3757)
- [ ] Better / TB-like tooltips https://github.com/iterative/vscode-dvc/issues/4532
daavoo commented 2 years ago

Haven't really thought on cross-product implications

dberenbaum commented 2 years ago

@shcheklein @tapadipti @Suor @rogermparent @mattseddon Interested to hear your thoughts on this.

rogermparent commented 2 years ago

Looks like a very cool library, and certainly much easier to use than Vega! The defaults don't quite match our design, but it seems there's enough ways to hook into hover and click events to make up for that and the ability to set colors to lines is much friendlier than the equivalent scale API in Vega.

I'd say if the goal is to encourage users to develop tools based on these plots, a more high-level alternative backend like plotly here could do the job. Worth noting it could add some complexity in features like --show-vega, but that can likely be handled without too much issue. Maybe that could be handled by changing --show-vega to --show-json with a new --backend or --plots-backend flag that's vega by default and can be set to plotly? The names would make sense if we added images to the --show-vega output from iterative/dvc#6752.

daavoo commented 2 years ago

I'd say if the goal is to encourage users to develop tools based on these plots, a more high-level alternative backend like plotly here could do the job. Worth noting it could add some complexity in features like --show-vega, but that can likely be handled without too much issue. Maybe that could be handled by changing --show-vega to --show-json with a new --backend or --plots-backend flag that's vega by default and can be set to plotly? The names would make sense if we added images to the --show-vega output from iterative/dvc#6752.

Side note: I agree that --show-json would be a better name, even without plotly.

This is a very good point to discuss. My original idea for the backend was to be a property of each individual plot or even inferred from the template. So that users could mix plots with different backends in a single dvc.yaml:

stages:
  train:
    cmd: python train.py
    plots:
      - prc.json:
           cache: false
           x: recall
           y: precision
           template: linear_plotly.json # Inferred
       - roc.json:
           cache: false
           x: fpr
           y: tpr
           backend: plotly # Explicit

The main motivation (besides giving users flexibility) was that plotly (or another backend) would probably introduce some type of plot that is not easily supported in vega and we don't want to commit ourselves to maintain feature parity across plot backends.

However, when considering --show-json I assume that mixing schemas in that output could be a little problematic for integrations 🤔 (Is that assumption correct, @rogermparent ?)

rogermparent commented 2 years ago

I assume that mixing schemas in that output could be a little problematic for integrations thinking (Is that assumption correct?)

I think that would be the case, but it could be handled if there was some way to distinguish the plots with different schemas from each other.

tapadipti commented 2 years ago

Supporting plotly was one of the ideas that came up during Studio ideas brainstorming sessions a while back. And it is one of the items we have in the roadmap for next year. I'm not sure how much work it would be in Studio to support this (may be @Suor would have some idea), but eventually it needs to be supported (at least as per the current roadmap / plan).

Suor commented 2 years ago

As far as I see plotly template is not JSON but JavaScript code, which is problematic from security perspective.

daavoo commented 2 years ago

As far as I see plotly template is not JSON but JavaScript code, which is problematic from security perspective.

Not sure If I understand. At least on DVC side, plotly would behave "exactly" as vega.

For an example linear plot, we would have a JSON template with placeholders:

vega ```json { "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "values": "" }, "title": "", "width": 300, "height": 300, "layer": [ { "encoding": { "x": { "field": "", "type": "quantitative", "title": "" }, "y": { "field": "", "type": "quantitative", "title": "", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } }, "layer": [ { "mark": "line" }, { "selection": { "label": { "type": "single", "nearest": true, "on": "mouseover", "encodings": [ "x" ], "empty": "none", "clear": "mouseout" } }, "mark": "point", "encoding": { "opacity": { "condition": { "selection": "label", "value": 1 }, "value": 0 } } } ] }, { "transform": [ { "filter": { "selection": "label" } } ], "layer": [ { "mark": { "type": "rule", "color": "gray" }, "encoding": { "x": { "field": "", "type": "quantitative" } } }, { "encoding": { "text": { "type": "quantitative", "field": "" }, "x": { "field": "", "type": "quantitative" }, "y": { "field": "", "type": "quantitative" } }, "layer": [ { "mark": { "type": "text", "align": "left", "dx": 5, "dy": -5 }, "encoding": { "color": { "type": "nominal", "field": "rev" } } } ] } ] } ] } ```
plotly ```json { "legendgroup": "", "line": { "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "", "orientation": "v", "showlegend": true, "type": "scatter", "xaxis": "", "yaxis": "", "x": "", "y": "" } ```

That would get filled with datapoints collected by DVC plots and embedded in an HTML div:

vega ```html
```
plotly ```html
```

The {partial} placeholder in the HTML div is filled, in both cases, a plain JSON object which is also what's currently returned when using the --show-vega option.

So, --show-plotly (or --show-json) would return a similar plain JSON object.

dberenbaum commented 1 year ago

Another reason plotly would be useful: it has built-in support for more ML/DS/analytical visualizations, like smoothing (see https://github.com/iterative/vscode-dvc/issues/3837).

daavoo commented 1 year ago

I could start moving this forward in the CLI first and trying to get something working on Studio by myself (probably with som help) after

dberenbaum commented 1 year ago

If we can start trying to build towards it being a drop-in replacement for vega-lite, I think it would be nice.

mattseddon commented 10 months ago

I've started looking at this from the VS Code perspective.

I can see in #88 (not sure if that PR is active or not) that the current idea is for DVC to hold the required data in the same format for both Vega & Plotly. Might it make sense to change that approach given that dvc-render/vscode-dvc/studio would all have to contain the same(-ish) logic to convert the datapoints from that format into what is required by Plotly? If the idea is to keep the data decoupled from the render is there a better/more general format to hold it in?

For the extension it would be good to get the contents of data / layout provided separately under the --split option through plots diff. My suggestion for the json output from plots diff for Plotly plots would be:

{ "data": {
    "dvc.yaml::name": [
      {
        "type": "plotly",
        "revisions": ["workspace"],
        "layout": {LAYOUT},
        "data": {DATA},
       }]
}}

That way we'll be able to update the marker.color (or equivalent) for each experiment where appropriate (e.g. linear/scatter plots). We can also continue to hold the equivalent of templates/data separately internally.

LMK what you think. If we can agree on the approach I have the capacity to make contributions here and in DVC to get this moving.

mattseddon commented 10 months ago

Might it make sense to change that approach given that dvc-render/vscode-dvc/studio would all have to contain the same(-ish) logic to convert the datapoints from that format into what is required by Plotly?

I can see this would be a more involved change because Studio reaches directly into DVC and calls repo.plots.collect() here:

https://github.com/iterative/studio/blob/355dbe2e0ffdbdd63089bf29ccc381bcc140cf3d/backend/repos/parsing/dvcmeat.py#L182

daavoo commented 10 months ago

Might it make sense to change that approach given that dvc-render/vscode-dvc/studio would all have to contain the same(-ish) logic to convert the datapoints from that format into what is required by Plotly?

I can see this would be a more involved change because Studio reaches directly into DVC and calls repo.plots.collect() here:

https://github.com/iterative/studio/blob/355dbe2e0ffdbdd63089bf29ccc381bcc140cf3d/backend/repos/parsing/dvcmeat.py#L182

The draft P.R.'s motivation was to keep the "status quo" of Vega implementation, introducing Plotly in a transparent way for DVC. Doubts about the right approach (and capacity of the teams to work on ot) for the UIs (VSCode, Studio) were the reason for not continuing the work on it.

For the extension it would be good to get the contents of data / layout provided separately under the --split option through plots diff. My suggestion for the json output from plots diff for Plotly plots would be:

For the --split option, we can do whatever postprocessing we want for the extension. We are already "breaking" the internal format used today for Vega.

LMK what you think. If we can agree on the approach I have the capacity to make contributions here and in DVC to get this moving.

I will do a minor update to the dvc-render P.R. , as it is currently missing the layout part.

Then we can discuss how to handle the --json and --split in DVC. It would be great if we could think of a way to unify how VSCode and Studio (FE) do the postprocessing of the plots, so we can update Studio (BE and FE).

We also need to decide how/when to enable Plotly. Options from the top of my mind:

A) Have a feature flag in DVC like dvc config plots.plotly True B) Do a silent drop-in replacement for default plots. Just start rendering using plotly in case there is no custom template. C) Allow users to explicitly enable plotly for some plots (i.e. introduce and handle special template names like plotly_linear)

mattseddon commented 10 months ago

I am going to catch up with @daavoo today about this (thanks for sending an invite David).

The plan for me right now is to build a thin vertical slice along the lines of option B above. Ideally in the next two sprints, I'd like to be able to replace the smooth/linear/scatter templates with Plotly implementations (feels ambitious).

Findings so far:

I have been playing around with Plotly and the biggest difference with respect to Vega seems that the data and template are much less separate and get mangled together in order to create the desired output. As the "smooth" template seems to be the hardest of the three to generate I've been working on that.

I've managed to adapt the below examples to generate a demo of what is possible in terms of "smoothing" (not worrying about style yet)

https://plotly.com/javascript/sliders/#add-a-play-button-to-control-a-slider https://plotly.com/javascript/gapminder-example/#animating-with-a-slider

https://github.com/iterative/dvc-render/assets/37993418/0ac1cd29-ccad-4535-853b-71db735ea8c5

Code for the demo ``` function smoothTriangle(data, degree) { const triangle = [ ...Array(degree + 1).keys(), ...[...Array(degree).keys()].reverse() ] // up then down const smoothed = [] for (let i = degree; i < data.length - degree * 2; i++) { const point = data .slice(i, i + triangle.length) .map((x, j) => x * triangle[j]) smoothed.push( point.reduce((a, b) => a + b) / triangle.reduce((a, b) => a + b) ) } // Handle boundaries const halfDegree = Math.floor(degree / 2) const leftBoundary = Array(halfDegree + 1).fill(smoothed[0]) const rightBoundary = Array(data.length - smoothed.length).fill( smoothed[smoothed.length - 1] ) return [...leftBoundary, ...smoothed, ...rightBoundary] } const y = [ 0.2707333333333333, 0.40696666666666664, 0.4991833333333333, 0.6582666666666667, 0.5437333333333333, 0.6674, 0.6644, 0.6833166666666667, 0.7272, 0.68985, 0.7435333333333334, 0.6868166666666666, 0.76165, 0.7097833333333333, 0.7694, 0.7323666666666667, 0.7824166666666666, 0.7494666666666666, 0.7894, 0.7608166666666667, 0.7819833333333334, 0.7650833333333333, 0.7718833333333334, 0.7713, 0.7773166666666667, 0.77915, 0.7855166666666666, 0.7837166666666666, 0.7916333333333333, 0.7893666666666667, 0.7960833333333334, 0.7940166666666667, 0.7951333333333334, 0.7986333333333333, 0.7998666666666666, 0.80405, 0.8076333333333333, 0.8097, 0.8160333333333334, 0.81405, 0.82245, 0.8198166666666666, 0.8292666666666667, 0.8251166666666667, 0.8348666666666666, 0.8303333333333334, 0.8397333333333333, 0.8357833333333333, 0.8434, 0.8401166666666666, 0.8468333333333333, 0.8441833333333333, 0.8502, 0.8476833333333333, 0.8530833333333333, 0.8513, 0.8561666666666666, 0.8553166666666666, 0.85905, 0.8595666666666667, 0.8616666666666667, 0.8631, 0.8645833333333334, 0.8659666666666667, 0.8678833333333333, 0.86965, 0.87255, 0.8734, 0.8752666666666666, 0.8785333333333334, 0.8778166666666667, 0.8829833333333333, 0.8794166666666666, 0.8860833333333333, 0.8793666666666666, 0.8881166666666667, 0.8799666666666667, 0.8906, 0.8814666666666666, 0.8921166666666667, 0.8832333333333333, 0.8939333333333334, 0.8849333333333333, 0.8918666666666667, 0.8869, 0.8937833333333334, 0.8885333333333333, 0.8953166666666666, 0.8903833333333333, 0.8961166666666667, 0.8915333333333333, 0.89725, 0.8925, 0.8984333333333333, 0.8935333333333333, 0.8996666666666666, 0.8948333333333334, 0.9006833333333333, 0.8959, 0.9020166666666666 ] const trace2 = { y, x: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 ], mode: 'lines', name: 'workspace', line: { color: 'rgb(255, 217, 102)' }, type: 'scatter' } const trace2A = { y, x: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 ], mode: 'lines', opacity: 0.1, type: 'scatter', showlegend: false } var data = [trace2,trace2A]; Plotly.newPlot('myDiv', {data, layout:{ sliders: [{ pad: {t: 30}, x: 0.05, len: 0.95, currentvalue: { xanchor: 'right', prefix: 'degree: ', font: { color: '#888', size: 20 } }, transition: {duration: 500}, // By default, animate commands are bound to the most recently animated frame: steps: [ { label: '0', method: 'animate', args: [['0'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }]}, { label: '1', method: 'animate', args: [['1'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }]}, { label: '2', method: 'animate', args: [['2'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }]}, { label: '3', method: 'animate', args: [['3'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }] }, { label: '4', method: 'animate', args: [['4'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }] }, { label: '5', method: 'animate', args: [['5'], { mode: 'immediate', transition: {duration: 300}, frame: {duration: 300, redraw: false}, }] }] }], updatemenus: [{ type: 'buttons', showactive: false, x: 0.05, y: 0, xanchor: 'right', yanchor: 'top', pad: {t: 60, r: 20}, buttons: [{ label: 'Play', method: 'animate', args: [null, { fromcurrent: true, transition: {duration: 300}, frame: {duration: 500, redraw: false} }] }] }] }, // The slider itself does not contain any notion of timing, so animating a slider // must be accomplished through a sequence of frames. Here we'll change the color // and the data of a single trace: frames: [ { name: '0', data: [{ y }], },{ name: '1', data: [{ y: smoothTriangle(y,1) }], },{ name: '2', data: [{ y: smoothTriangle(y,2) }], }, { name: '3', data: [{ y: smoothTriangle(y,3) }], }, { name: '4', data: [{ y: smoothTriangle(y,4)}] }, { name: '5', data: [{ y: smoothTriangle(y,5) }] }] }); ```

This does use the triangular moving average function mentioned previously (shown here) but that function is something that we have to implement on our own. We can also forgo the play button but it seems that in order to show different smoothed options we have to calculate all of the new y values ourselves and load each set of values into distinct frames. The second example above shows how this can be done programmatically but it is going to get complicated when we add multiple data sources + revisions (seems like ordering is the only thing that ties them together).

Edit: Demo using ema as smoothing function -

https://github.com/iterative/dvc-render/assets/37993418/e77041e6-2ac3-4894-a212-c629092b5436

mattseddon commented 10 months ago

Today I've been looking at Vega. I have opened the above PR to add zoom/pan to plots in VS Code and have been able to come up with these tooltips for linear plots.

PTAL and LMK what you think/if this changes anything.

dberenbaum commented 10 months ago

@mattseddon Is your point that we should reconsider plotly?

mattseddon commented 10 months ago

@mattseddon Is your point that we should reconsider plotly?

I am really not sure. I think both Vega and Plotly have their own benefits and constraints.

Let's chat about whether or not we still want to take this on when we meet this week.

In the meantime, I am going to attempt to update the default templates to add zoom + pan and new tooltips. E.g. for smooth/linear, we will end up with:

https://github.com/iterative/dvc-render/assets/37993418/aebdff1f-e589-40dc-9122-75cb7bde3f30

As you can see from the above screen recording the template is not perfect as the tooltip contains {rev::filename::field} as an identifier. Right now this is the only way we can get around the fact that all templates are generalised.

I am also going to look further into the Studio/DVC code. Whatever we decide we need to start on removing parts of the legacy process.

New proposed smooth template ```json { "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "values": "" }, "title": "", "width": "container", "height": "container", "params": [ { "name": "smooth", "value": 0.001, "bind": { "input": "range", "min": 0.001, "max": 1, "": 0.001 } } ], "layer": [ { "encoding": { "y": { "field": "", "type": "quantitative", "title": "", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } }, "layer": [ { "mark": "line" }, { "transform": [{ "filter": { "param": "hover", "empty": false } }], "mark": "point" } ], "transform": [ { "loess": "", "on": "", "groupby": ["rev", "filename", "field", "filename::field"], "bandwidth": { "signal": "smooth" } } ] }, { "params": [{ "bind": "scales", "name": "grid", "select": "interval" }], "mark": { "type": "line", "opacity": 0.2 }, "encoding": { "x": { "field": "", "type": "quantitative", "title": "" }, "y": { "field": "", "type": "quantitative", "title": "", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } } }, { "mark": { "type": "circle", "size": 10, "tooltip": { "content": "encoding" } }, "encoding": { "x": { "aggregate": "max", "field": "", "type": "quantitative", "title": "" }, "y": { "aggregate": { "argmax": "step" }, "field": "", "type": "quantitative", "title": "", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } } }, { "transform": [ { "calculate": "datum.rev + '::' + datum.filename + '::' + datum.field", "as": "tooltip-group" }, { "pivot": "tooltip-group", "value": "acc", "groupby": [""] } ], "mark": { "type": "rule", "tooltip": { "content": "data" } }, "encoding": { "opacity": { "condition": { "value": 0.3, "param": "hover", "empty": false }, "value": 0 } }, "params": [ { "name": "hover", "select": { "type": "point", "fields": [""], "nearest": true, "on": "mouseover", "clear": "mouseout" } } ] } ], "encoding": { "x": { "field": "", "type": "quantitative", "title": "" } } } ```

Demo VS Code

https://github.com/iterative/dvc-render/assets/37993418/bc43c395-da81-4073-bb78-fb6a0e340f8f

In order to implement this I think we need to consolidate the post-processing of data in the three products (due to the use of "calculate": "datum.rev + '::' + datum.filename + '::' + datum.field", "as": "tooltip-group" for the new tooltips).

mattseddon commented 10 months ago

For anyone following this issue:

This has been temporarily deprioritised whilst https://github.com/iterative/dvc/issues/9940 is worked on.

dberenbaum commented 10 months ago

The main reasons to migrate to plotly would be:

  1. It's a more popular library, especially with the ML community.
  2. It opens options for users to eventually develop custom plots using the familiar plotly python api and visualize them with dvc.

A distant 3rd reason is UI improvements over vega lite, but I think we can already see that there will likely be as many drawbacks as advantages to the plotly UI. I think the first 2 points are strong enough that it's worth moving, but I don't think we have time to work towards the 2nd point now, and we have already put a ton of time into plots, so I would consider plotly a "nice to have" rather than an urgent priority.

mattseddon commented 8 months ago

@shcheklein I think @dberenbaum summed it up well in the last comment. Plotly would not be a silver bullet and I don't think we can justify the effort for the benefits that we would get right now.