[Proposal] Consolidate chart-rendering to Vega-Lite

joshuarrrr commented 2 years ago

Is your feature request related to a problem? Please describe.

Today, there are at five different charting libraries that are use to render different visualization types in OpenSearch Dashboards (OSD). This makes it difficult for developers to provide a consistent user experience across visualization types, because any vertical feature that targets all visualizations must be implemented separately in each.

Essentially we want to

Provide OSD users with a clean, unified charting experience that looks polished and consistent.
Provide OSD developers with unified interfaces to develop, enhance, and interact with charts

Describe the solution you'd like

All new OSD features and improvements will target Vega-Lite as the rendering layer and Vega-Lite as the declarative grammar to describe chart attributes and interactions
OSD will provide high-level APIs and React components will make it easy for plugin authors to use/add/extend charting features without interacting with Vega-Lite directly
Design system styling and interaction patterns will be built-in by default, so that developers will don't need to re-implement them
OSD will formalize charting contracts and capabilities for plugins that require alternative rendering libraries or approaches

Describe alternatives you've considered

Instead of choosing a single preferred visualization library, just focus on wrapper APIs, components, and contracts
Introduce Plotly.js as a new dependency and choose as the charting library for all visualization rendering
Introduce Apache Echarts as a new dependency and choose as the charting library for all visualization rendering

Additional context

Why Vega-Lite?

Vega-Lite is built on years of academic research to create a concise, declarative JSON specification for interactive visualizations that enables users to rapidly create interactive visualizations. Even complex combinations of charts, layers, and interactions can be constructed by merging or updating a single simple object structure. It also makes it easy to write API wrappers, because they only need to generate JSON output (or partial output) following the Vega-Lite specification.

OSD already has a dependency on Vega-Lite, and it’s one we can’t easily remove even if we chose a different charting library (like Plotly.js). The fundamental value of the Vega visualization feature is that it allows end user’s to build and create totally new chart types by writing Vega/Vega-Lite specifications directly so there’s no sensible way to migrate that rendering to another library with different syntax.

The existing Vega visualization feature also provides a foundation for a low-code visualization creation flywheel. Users can use the built-in Vega-Lite editor to prototype new chart types not currently supported in OSD, and share these with the community, and it will be easy to “graduate” these chart types to Vis Builder or other easy-to-use charting tools. In the other direction, we can provide the Vega editor as an “advanced” editor to tweak or change the behavior of officially defined visualizations. This approach provides greater flexibility and power than existing systems which only allow custom JSON configurations without an interactive environment.

Vega-Lite visualizations have already been connected into all the important plumbing of OSD - they can be rendered in dashboards, have expressions implementations, and even some styling to make them similar to the built-in visualization types, so much of the groundwork has already been laid.

Some developers may initially find the Vega-Lite grammar to be a bit different from other JavaScript charting libraries or APIs, but that's part of its expressive power. The best way to get a sense of it is to look at examples.

To see Vega-Lite's approach to defining interactivity, see this example of a threshold based annotation/overlay with binding. (Note the use of params, which is Vega-Lite’s system for setting and using variables within the declarative data structure.) See the gallery for other examples.

To directly compare it to more traditional JavaScript charting libraries like Plotly.js, here’s a multi-series line chart with a unified tooltip and guide implemented in both:

How we decided

We did a quick survey of 20+ JavaScript charting library APIs, adoption, and usage to narrow-down our decision to four contenders: Vega-Lite, Vega, Plotly.js, and Apache Echarts, each of which met most of our criteria, summarized in the chart below (you may need to scroll right to see the entire table).

-	Vega-Lite	Vega	Plotly.js	Apache Echarts
URL	https://vega.github.io/vega-lite/	https://vega.github.io/vega/	https://plotly.com/javascript/	echarts.apache.org
Description	Vega-Lite is a high-level visualization grammar. It provides a concise JSON syntax for supporting rapid generation of visualizations to support analysis. Vega-Lite support interactive multi-view graphics. Specifications can be compiled to Vega.	Vega is a visualization grammar, a declarative language for creating, saving, and sharing interactive visualization designs. With Vega, you can describe the visual appearance and interactive behavior of a visualization in a JSON format, and generate web-based views using Canvas or SVG.	Built on top of d3.js and stack.gl, Plotly.js is a high-level, declarative charting library. plotly.js ships with over 40 chart types, including 3D charts, statistical graphs, and SVG maps.	Apache ECharts is an open-sourced JavaScript visualization tool, which can run fluently on PC and mobile devices. It is compatible with most modern Web Browsers, e.g., IE9/10/11, Chrome, Firefox, Safari and so on. ECharts depends on ZRender, a graphic rendering engine, to create intuitive, interactive, and highly-customizable charts.
Repo	https://github.com/vega/vega-lite	https://github.com/vega/vega	https://github.com/plotly/plotly.js	https://github.com/apache/echarts
Actively maintained	yes	yes	yes	yes
Latest version (date)	5.22.1 (Mar 2022)	5.4.0 (Jul 2022)	2.14.0 (Aug 2022)	5.3.3 (Jun 2022)
GH stars	3.9K	10K	15K	52K
dowloads/week	300K	300K	200K	500K
Version in OSD	5.17.3	4.16.8	1.57.1, 2.2.0	N/A
License	BSD 3-Clause	BSD 3-Clause	MIT	Apache 2.0
Adher to SemVer
Documentation	Docs	Docs	Reference	Handbook
Roadmap	Roadmap	Roadmap	None	None
Security Posture
Min add'l bundle weight	0 (already in bundle; 91kB)	0 (already in bundle 172.4 kB)	1.1 MB minified + gzipped	321.8kB minified + gzipped
Delarative API	yes	yes	yes	yes
facets/subplotting	yes	yes	yes	yes
layering/compositing
bindable interactivity	yes	yes	yes	yes
Rendering	SVG, canvas, HTML5	SVG, canvas, HTML5	SVG, WebGL	SVG, canvas, WebGL, VML
SSR	yes	yes	yes	yes
responsive	yes	yes	yes	kind of
a11y features	aria labels	aria labels	no	automatic labels
i18n	yes	yes	yes	yes
react library	no	no	yes	no

ahopp commented 2 years ago

@joshuarrrr Personally, I'm completely aligned on this recommendation. From my perspective, users don't particularly care about the underlying technology being used as along as they have a consistent, feature rich charting experience throughout. Given the integration and flexibility, Vega-Lite as the rendering layer and Vega-Lite as the declarative grammar to describe chart attributes and interactions seems like the best approach (versus the additional migration and re-writing with no direct customer benefit that something like Plot.ly would offer) seems like the best path to get to that.

It also seem pretty clear, that Vega Lite is the only option that adds no additional dependencies, no incremental complexity, and no new maintenance bloat since it leverages existing OSD integration and implementation. In addition, I think their design principles (e.g., "Provide sensible defaults, but allow customization", "Favor composition over templates", and "Support gradual specification") and their development principles (e.g., "Strive to remain backwards compatible", "Generate generic Vega specifications", "Enable transition to Vega", "Fail gracefully") are very sound as an upstream dependency.

Thanks for the research and recommendation!

brijos commented 2 years ago

Accessibility support is a hard requirement. I see that there is a ticket open on Plot.ly's Github to invest, but it has been open since 2016. Are companies mitigating somehow?

joshuarrrr commented 2 years ago

@brijos Accessibility of data visualizations is definitely important, and an area where there's room for improvement in most charting libraries and tools. The Vega-Lite discussion issue is a useful window into some of the technical challenges, as well as ideas for further improvements. Regardless of the charting library we choose, this is an area where we can lead and contribute a11y improvements upstream.

kavilla commented 2 years ago

[Triage]

@joshuarrrr could you provide some insight on a migration path for existing charts.

seanneumann commented 2 years ago

@anirudha - I know you have a lot of thoughts here and are a big proponent of plot.ly. Can you chime in?

anirudha commented 2 years ago

Priorities

need to start working on an Abstraction interface , and the following requirements are met.

need to list all plugin requirements for visualization & overlays need to formalize existing visualization requirements that are used today so plugins can easily migrate need a process to request new features for visualization and overlays need to review the data layer ( that follow associative vectors or a data frame structure )

Decisions / Questions

What frameworks do the anywhere projects use to implement layering on Dashboards.
1. Anywhere projects should not use a charting library but should use an “Abstract Charting Interface”.
What does the Abstract Charting interface use as its impl. library ?
1. There will always be a need for a multiple charting library, but for base charting types we need to ensure we have standard features supported. This is today supported by Vega-Lite, Plotly and Echarts
There is no-substantial technical difference in the charting implementations, the minor difference can be fixed in each option as we become contributors and start working on visualizations.
How do we decide between libraries,
1. Features we need ?
  1. Custom Charting support “VEGA” visualization [ important ]
  2. Multiple data sources
  3. Long-term community support / Issues , PRs
  4. Governance model, Ability to become maintainers
  5. Performance
How much weight do the legacy implementations carry in our charting library decision

in Summary,

based on the above, VEGA is an important visualization type and has existing support/usage in dashboards.
Echarts has a higher community adoption, but does not easily enable custom visualizations and user created visualizations similar to VEGA plugin. E-charts is an apache foundation project that has a more promising long-term impact on the project and governance.

anirudha commented 2 years ago

The Observability plugin needs the following requirements reviewed as we migrate to the abstraction interface.

need to list all plugin requirements for visualization & overlays need to formalize existing visualization requirements that are used today so plugins can easily migrate need a process to request new features for visualization and overlays need to review the data layer ( that follow associative vectors or a data frame structure )

seanneumann commented 2 years ago

Thanks Ani!

ahopp commented 2 years ago

need to list all plugin requirements for visualization & overlays

Do you mean we have to gather requirements from the plugins before making a proposal on any future abstraction layer? If so, I agree. I'm happy to help coordinate with the plugins with the help of @joshuarrrr and @ashwin-pc. Some of the work is already being down with the alerting / anomaly detection work @brijos is driving.

need to formalize existing visualization requirements that are used today so plugins can easily migrate

Can you help me understand a bit more here? Are you saying you want to see a list of visualizations prioritized for support in Vega-lite for consolidated chart-rendering? e.g., which will be included and which are on the roadmap? If so, I agree. @joshuarrrr I'm happy to contribute a first pass at this list in the eventual proposal.

need a process to request new features for visualization and overlays

I think we already have this in the form of GitHub issues. Both upstream in the charting library and downstream in OpenSearch Dashboards. We also would still have the same path for contribution of new features, overlays, etc. Do you agree? If not, what are the gaps you are seeing?

need to review the data layer ( that follow associative vectors or a data frame structure )

I think the ask here is to have a proposal on the data layer that supports the requirements of the abstraction layer, is that correct? I'm not sure these two need to come at the same time, but I'd be curious to hear your recommendation on order of operation.

ahopp commented 2 years ago

There will always be a need for a multiple charting library, but for base charting types we need to ensure we have standard features supported. This is today supported by Vega-Lite, Plotly and Echarts

I think one of the goals of the abstraction layer would be to ensure that we have a path to enable all supported features in downstream visualizations as long as the abstraction layer is being used. Specifically, abstraction is used for a multitude of reasons (centralization, simplicity, improved testing, separation of policy and detail, etc.) but one of the goals of exposing an interface to handle visualizations/chart rendering in a unified way is to include all OpenSearch Dashboard supported features in all visualizations/charts.

I realize it might be aspirational at this point, but I think it's the right goal to take. And we should aim high if we're going to propose a true abstraction layer.

How do we decide between libraries

I don't think multiple data sources, long-term community support/issues , governance model, ability to become maintainers, and performance aren't important. Specifically, this decision should not impact our ability to support multiple data sources, doesn't prevent us from becoming maintainers on any of the repos highlighted, or optimize for performance.

I know this was just a statement, but are there any specific decision points that you think needs additional justification or exploration on? I realize some of these requirement will need to be considered in the abstraction proposal but is there any specific to the charting decision you don't think are supported above?

How much weight do the legacy implementations carry in our charting library decision

As far as I am concerned, legacy implementations do not affect the aspiration of OpenSearch Dashboard across features, user experience, long-term vision, etc. BUT we do consider implementation details when making implementation decisions ceteris paribus or when evaluating downstream impacts. Specifically, if we can achieve our user experience goal in multiple ways (ex. if any of the evaluated libraries could be made to work), we need to consider the path from the current state to a future state which requires understanding and evaluation of legacy implementations. Does that help?

There is no-substantial technical difference in the charting implementations, the minor difference can be fixed in each option as we become contributors and start working on visualizations.

I don't think this is true. There may not be substantial technical differences in the charting implementations in general, but there are meaningful technical differences in how charting exists in OpenSearch Dashboards in particular. For example, OpenSearch Dashboard already has a dependency on Vega-Lite, the Vega visualization feature is already implemented and already well used, Vega-lite visualizations can already be rendered in dashboards, and we already have expressions implementation for Vega-lite.

E-charts is an apache foundation project that has a more promising long-term impact on the project and governance.

I appreciate the perspective here, but what is more promising in the long-term is very subjective. I think we should focus on building a solid experience based on what we need for our users now and in the future and make sure our abstraction layer is designed to facilitate interoperability long-term. The decision of which library can be revisited over time as long we keep our long-term goal of an abstraction layer that generalizes (i.e., removed from specific implementation requirements). Do you agree? Or do you see the not being an apache foundation a critical requirement to reconsider for charting libraries?

kavilla commented 2 years ago

[Planning]

To make an epic and break down @joshuarrrr to make issues.

Chart abstraction
Catalog current state

@joshuarrrr please communicate insight cc: @ashwin-pc

joshuarrrr commented 2 years ago

@joshuarrrr could you provide some insight on a migration path for existing charts.

The migration path for existing charts is described and tracked in #2819

opensearch-project / OpenSearch-Dashboards

[Proposal] Consolidate chart-rendering to Vega-Lite #2385

Why Vega-Lite?

How we decided