[Proposal] Data Explorer/Discover 2.0

ashwin-pc commented 1 year ago

Overview

Data Explorer project represents a strategic effort to address the security concerns associated with the legacy Discover component while delivering an enhanced and versatile data exploration experience for users. Data Explorer serves a dual purpose. Firstly, it aims to upgrade the existing Discover plugin to use React. Secondly, it seeks to become a comprehensive platform that caters to all data exploration needs.

All the existing features of Discover will seamlessly transition into Data Explorer. Additionally, this upgraded platform will introduce new capabilities, such as exploratory data analysis through aggregations and visualizations, as well as support for various query languages. The integration of these features within Data Explorer aims to provide a unified and efficient environment for data exploration tasks.

Background

The Discover component has been a longstanding and crucial part of OSD OpenSearch Dashboards. However, it was built using Angular version 1.0, which poses security concerns, requiring its removal by the end of the year Ref. Data exploration in OpenSearch has also been confusing with many different applications such as Discover, Visbuilder, Event Analytics all solving similar problems. Data explorer aims to solve both problems at once by offering a single platform where all data exploration views can exist simultaneously while also using it first as a target for the Discover deangularization effort.

Requirements

Remove AngularJS from Discover
Feature parity with Discover
Should have the ability to register and toggle between different data exploration views depending on the data source
The architecture should be extensible enough for existing exploration apps to migrate without too many changes.

Architecture

Data Explorer architecture

For this feature the following changes will be made to the plugins:

data_explorer: This is a new plugin introduced to act as the view container for the different sub applications for data exploration.
discover: The discover 2.0 plugin that is a combination of existing non angular code from discover_legacy, vis_builder and other parts of OSD that is useful and new components where existing components do not exist
vis_builder: Migrates its views from a separate app to a view inside data_explorer
discover_legacy: The existing discover plugin will move to discover_legacy and all associated routes will be renamed as discover_legacy. While sharing, the URL will still reference discover.

Division of responsibility

Data explorer division

Data explorer simply acts as a shell for other exploration views, so here is how the responsibilities are divided between Data Explorer and its views.

Data Explorer: Data explorer is responsible for 4 primary features.

Data source: Data Explorer is the source of truth about the datasource that is being explored.
View Registry: Each app can register themselves as a view and Data explorer is responsible for displaying the view when the user selects it.
State management: Data explorer apps will also have a shared state and Data explorer will also provide hooks that allows the underlying app to register their own state reducers.
Shared utilities: While each view will have components specific to their view, there are often components and utilities that can be shared between them. e.g. The available fields for a given data source and its summary popup. These components can be shared such that other views can also reuse them.

Views: Each of the views on the other had are responsible for:

Metadata storage: Logic for storing the metadata and how to retrieve it is all collocated inside the view since its usually view specific.
Nav options & Search/Query bar: This may change in future but for the initial version, the view is also responsible for the navigation options and the search bar with its time filter and search bar. For discover and visbuilder, this component already comes from the data plugin so not much of a change here.
View specific logic: nothing special here. Just all view specific rendering and application logic resides here too and does not bleed into Data Explorer
Embeddable: Data Explorer embeddables are also not shareable so each View is responsible for registering their own embeddables.

The goal here is that by minimizing the surface area for Data Explorer specific changes, each application can migrate their existing view relatively easily onto data explorer.

Wireframe

Data explorer wireframe

View Registry

The Data Explorer exposes a view registry that allows apps to register themselves as views within Data Explorer. Each view object has the following properties:

Data model:

interface View<T = any> {
  id: string;
  title: string;
  icon: IconType;
  ui: {
    panel: React.ComponentType;
    workspace: React.ComponentType;
    defaults: T;
    reducer: (state: T, action: Action) => T;
  }
  defaultPath: string;
  extention: {
    type: string;
    toList: (savedObject: SavedObject) => ViewListItem;
  }
  shouldShow?: (state: DataExplorerState) => boolean;
}

id: The id of the view. This is used to identify the view in the view switcher and in the URL
title: The title of the view. This is used to display the view in the view switcher
icon: The icon of the view. This is used to display the view in the view switcher
ui: This is an object that contains the UI components for the view.
- panel: This is the component that is rendered in the panel when the view is selected
- workspace: This is the component that is rendered in the workspace when the view is selected
- defaults: This is the default state for the view. This is used to initialize the state when the view is selected
- reducer: This is the reducer for the view. This is used to update the state of the view when actions are dispatched
defaultPath: This is the default path for the view. This is used to redirect the user to the view when they select it in the view switcher
extention: This is an object that contains the extension for the view. This is used to register the embeddable for the view
- type: This is the type of the embeddable
- toList: This is a function that converts the saved object into a view list item. This is used to display each saved object when wants to load a supported saved object
shouldShow: This is an optional function that is used to determine if the view should be shown in the view switcher. This is useful for views that are not compatible with the current data source. Op

Migrating existing applications

Data explorer will likely be a place where existing application can migrate their data exploration views. This means that Data Explorer should also take into account how an application can do this without disrupting the users workflow. Such a migration is already needed for Discover and VisBuilder so I assume that we will need to do do this again for other views.

Migrating the actual application is pretty straight forward. Since Data Explorer is only responsible for 3 things, Data source, state management and view registry, the application can simply register itself as a view and then start using the Data Explorer state management and data source. The only thing that the application needs to do is to migrate its existing state management to the Data Explorer state management and reference the datasource passed down by Data Explorer instead of its own. The application needs to also modify its routes to use the Data Explorer routes and UI to match the panel and workspace components UI.

Routing

Routing is important for Data Explorer especially when we do not want to break backwards compatibility with existing applications. For this reason, Data Explorer will have its own routes that are prefixed with /data_explorer and the existing routes for each view is responsible for redirecting to the new routes. For example, the existing discover route will redirect to /data_explorer/discover and the existing vis builder route will redirect to /data_explorer/vis_builder by their respective plugins. This way, the existing routes can still be used and the new routes can be used for the new views.

Discover 2.0 - Deangularize

With the above architecture in mind, here is how the existing discover plugin will be migrated to the new architecture.

Research issue: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/4130

Mock wireframe Screenshot 2023-05-30 at 5 46 42 PM

Toggle

Toggling between the old and new views is a bit more complicated than a standard migration since in this view we need to support both the old and new plugins at the same time. To do this we implement the routing strategy as mentioned before with one change, Data Explorer's router also checks to see if using the legacy plugin is turned on or not. If it is, it routes all discover traffic to discover_legacy, else it sends it off to the discover view registered to it. Once the migration is complete we simply remove this check.

Migrating Features

Discover Router: Discover today has a router for the 3 separate pages hat it supports, the default view, view surrounding documents and view single document pages. This will be replaced instead with all the 3 views happening on the same page with surrounding documents and a single doc viewer being displayed on the flyout directly. An alternative here would be to keep those two views as separate pages too but not rendered using data explorer but directly via the discover plugin. Both options are valid and possible, so feedback here would be appreciated.
Document table: This is the expanding table that discover uses today for both its embeddable and default view to list all the documents. It will migrate to use Data Grid from OUI with the expand functionality moving to a flyout similar to event analytics. All the features supported by it will move over, the two most important ones being the DocViewer api and the DocViewerLinks api.
Doc table visualization: This is the Bar Chart above the doc table to show the number of records in the current search space. This will migrate to using the Expression Renderer, similar to Visualize and VisBuilder.
Top Nave ans search Bar: This will still use the data plugin's top nav and features, but the UI will adapt to match the Data Explorer mocks
Sidebar: The sidebar component will be a new component built using features from VisBuilder. a. The selected fields section will be a net new feature that will also have a drag and drop UX pattern b. The Available fields section will be built on top of a similar section from VisBuilder c. Field summary will also migrate from VisBuilder with the added ability to create a visualization from the view directly.
Embeddable: Will migrate to use the new Data Grid based component
Saved Searches: Will remain unchanged but point to the new Data Explorer view when edited

Utilities that are either not tied to the UI or are already migrated like the JSON doc viewer will be migrated as is to the new experience.

Open questions

How do we gather feedback during the transition phase for Discover -> Discover 2.0
How do we handle the other routes handled by discover i.e. Surrounding documents view and single document view. Do we continue to display them on a separate page, or do we display them in a flyout on the same page.

ahopp commented 1 year ago

Hey @ashwin-pc, thanks for the proposal!

Couple of things

It seems there are two very big pieces of work here (i.e., remediate AngularJS and upgrade Discover), but it seems we relegate the deangularize to another issue. Does the deangularize from AngularJS have any user impact to the proposed experience? If so, perhaps we can consider those separate issues and prompt for feedback accordingly.
I think the "view registry" that allows apps to register themselves as views within Data Explorer is a good idea, but is this extensible to apps that want to modify data_explorer or change Nav options & Search/Query bar? I'm curious on the guardrails (if any) we are hoping to implement.
Is the toggle domain/cluster wide? I assume not but we may want to call it out. Additionally, is there some clear user value in having some a prominent option, even within an experimental state, if this will be version incompatible at some point? Given the impacts to routing and complexities of compatibility and user adoption, adding some friction might be okay for the long-term user experience. Even a link to a setting might be preferred over a toggle.
Are there specific questions you want the community to weigh in on? Without more details on current versus future it's hard to ascertain the impact. For instance, do we have a list of feature(s) that we are adding that we should be weighing in on as a community? e.g., review the proposals for changes to the Sidebar, Document table, and Discover Router CX?

ahopp commented 1 year ago

Somewhat related; I know there is some security concerns with AngularJS but do we have specific user requirements we're working backwards from? Or some guiding beliefs that are informing this full redesign? We might want to include those to make it easier to provide feedback. For example, I think the unification of Event Explorer and Discover is one of those reasons, but I'm not sure this proposal address the overlaps and gaps between the two and how this will solve that problem. Right now, the background only focuses on the security concerns and doesn't explain why we'd make all the other changes as well.

kavilla commented 1 year ago

Will the core concept of the data explorer build around index patterns still?

abbyhu2000 commented 1 year ago

Should we include functional tests in the requirements? such as adding new functional tests, modifying the old discover functional tests for discover feature in data explorer, and also removing the irrelevant functional tests etc

joshuarrrr commented 1 year ago

Some quick and relatively minor feedback:

For the background section, let's link to the de-angularization meta issue.
What does "support for multiple views" mean as a requirement? What's the motivation for that?
I think we could add a little more context about how view composition works. The ui property of the view data model hints in this direction, but we should clarify how much flexibility views have over their overall layout. Marking up the wireframe with the panel sections etc. would be helpful.
Why is the property for embeddables called extension? The additional terminology is a bit confusing.
State management will need more tech details later

The only thing that the application needs to do is to migrate its existing state management to the Data Explorer state management

This could use some more details/examples.
Doc table visualization - I think this will need it's own small technical design/proposal, as I have some thoughts about how we can do this to improve the experience and prevent future churn.

ashwin-pc commented 1 year ago

Feedback from the initial review:

Spell out what the goals of Data Explorer are
Link to the open issues in the background for deangular:
- Meta issue
- ...
Expand on support for multiple views in the requirements sections
Show an image for the panel and workspace sections and how the Discover 2.0 maps to that architecture
How do we get feedback for Discover migration
- Community suggestions appreciated
- Since this is a one way decision, freeform feedback is necessary to understand what is missing

ashwin-pc commented 1 year ago

It seems there are two very big pieces of work here (i.e., remediate AngularJS and upgrade Discover), but it seems we relegate the deangularize to another issue. Does the deangularize from AngularJS have any user impact to the proposed experience? If so, perhaps we can consider those separate issues and prompt for feedback accordingly.

Not really, upgrading discover take care of deangular tasks as well since we wont carry over any angular components. Part of disciver is already mgrated away from AngularJS so we can use those as is, and we need to only focus on the components that arent.

I think the "view registry" that allows apps to register themselves as views within Data Explorer is a good idea, but is this extensible to apps that want to modify data_explorer or change Nav options & Search/Query bar? I'm curious on the guardrails (if any) we are hoping to implement.

There are intentionally few guardrails here for now since it makes migrating existing applications onto Data Explorer easier. We can revisit the guardrails in future once we have more data about how this app is being used. There is enough isolation between views so changes in one should not affect the other, but the views we have today are sufficiently different that abstracting away more features into Data Explorer makes the integration more difficult. Also since Data explorer has so few abstractions, i dont think that any view will need to modify it to make their feature work, and that was the intention.

Is the toggle domain/cluster wide? I assume not but we may want to call it out. Additionally, is there some clear user value in having some a prominent option, even within an experimental state, if this will be version incompatible at some point? Given the impacts to routing and complexities of compatibility and user adoption, adding some friction might be okay for the long-term user experience. Even a link to a setting might be preferred over a toggle

Its at the tenant level, and will be present in advanced settings. We plan on removing the toggle once the deangularization work is complete and i will update the proposal to call that out since its pretty lite on that right now.

Are there specific questions you want the community to weigh in on? Without more details on current versus future it's hard to ascertain the impact. For instance, do we have a list of feature(s) that we are adding that we should be weighing in on as a community? e.g., review the proposals for changes to the Sidebar, Document table, and Discover Router CX?

Not really, there is only one question, and that is about how we want to handle the other views that discover supports (surrounding documents and single doc views). Right now i'm just looking for general feedback on the approach and if there are any things that i missed in my proposal.

Somewhat related; I know there is some security concerns with AngularJS but do we have specific user requirements we're working backwards from? Or some guiding beliefs that are informing this full redesign? We might want to include those to make it easier to provide feedback. For example, I think the unification of Event Explorer and Discover is one of those reasons, but I'm not sure this proposal address the overlaps and gaps between the two and how this will solve that problem. Right now, the background only focuses on the security concerns and doesn't explain why we'd make all the other changes as well.

Thats a good point, yes combining the event explorer view with discover is one such reason, but I didnt mention that explicitly here since Event Explorer is not a part of the base OSD repo and for developers who are using the minimal distribution, may never encounter that view. Thats why i focused on VisBuilder which is another such view that we want to be able to toggle between to explore the data. I also kept the description of view intentionally ambiguous since i want to make it possible for any view (e.g. Event Explorer) to easily integrate with Data Explorer regardless of their underlying architecture. That being said i will expand the overview section to highlight the current gaps with our different views

ashwin-pc commented 1 year ago

@abbyhu2000

Should we include functional tests in the requirements? such as adding new functional tests, modifying the old discover functional tests for discover feature in data explorer, and also removing the irrelevant functional tests etc

Tests should be a part of any feature that is built so i didnt want to call that out as an explicit requirement here

ashwin-pc commented 1 year ago

@joshuarrrr

Why is the property for embeddables called extension? The additional terminology is a bit confusing.

Its not a new term and is infact borrowed from Visualize. This is the property Visualize uses to list all the compatible visualization saved objects in the visualize listing view.

State management will need more tech details later The only thing that the application needs to do is to migrate its existing state management to the Data Explorer state management This could use some more details/examples. Doc table visualization - I think this will need it's own small technical design/proposal, as I have some thoughts about how we can do this to improve the experience and prevent future churn.

Yeah, detailed designs for some of the components will follow, this proposal just focusses on the high level design of the project

ahopp commented 1 year ago

i will expand the overview section to highlight the current gaps with our different views

Another path would be to share some of the user-centric reasoning behind this large undertaking. Modernizing is important but trying to understand why this work specifically.

joshuarrrr commented 1 year ago

@ashwin-pc For some of the routing challenges of supporting old and new simultaneously, have you looked at https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/src/plugins/url_forwarding/README.md#L0-L1 ?

YANG-DB commented 1 year ago

@ashwin-pc this looks very promising ... few point to mention :

There are metadata notion of our visual elements that need to be present for selection / filtering as part of the data-source selector or any other metadata-enabled component
Federating multiple data-sources that may have different metadata notions (Prometheus, Cloudwatch,Opensearch ...) and how do we align them to be displayed within one notion (see all the datasource that has an Observability label or Security label )
Does the registry expose API for metadata ? can we filter by it ?

ashwin-pc commented 1 year ago

Hey @YANG-DB can you help me understand what you mean by notion here?

YANG-DB commented 1 year ago

notion is a semantic meaning that this visualization is related to. For example we can annotate a dashboard that shows visualization with the next notions: [http,cpu,Observability,payment-services]

similarly the data-source can have the next notions: [Observability,Prometheus, staging-env] So the this notion concept hide the metadata semantic relationship this visual element may represent and project

The labels technique is common practice for reflecting metadata notions that are both dynamic for user definition and open for system-based default values...

pjfitzgibbons commented 1 year ago

@ashwin-pc Thanks for this writeup. A lot of work... just to document this UX repositioning and describe the impact.

Upon first read, it sounds like de-Angularization is being used as a primary motivation for the enhancement. I'm concerned that this is "burying the lead" on why we're moving in this new direction for UX on discovery. de-Angularization is important, yet it is technically a stand-alone effort and can be accomplished with no UX/UI impact. Seems like it's convenient timing, maybe opportunistic timing; we should be focusing the UX position on how and why it is better for the user, without concern over how it will be implemented, and why it is more-timely now than later.

ashwin-pc commented 1 year ago

@pjfitzgibbons you are right in stating that deangularization can be achieved without any UI/UX impact and was infact the recommended approach at first. However when it was discussed whether we should spend a whole lot of time first migrating our angular components to react, just to rewrite the whole thing again as a Data Explorer tool, it seems like throwaway effort which is why the two projects were combined. This then introduced a wrinkle that the deangularization had to happen by EOY since it was a security risk that we had committed to remove by then. This mean that to meet the deadline without refactoring very different applications (Log Explorer and Discover) into one, a data explorer skeleton that allowed two independent applications to run at the same time had to be developed. That is the design you see here.

pjfitzgibbons commented 1 year ago

[The org] then introduced a wrinkle that the deangularization had to happen by EOY since it was a security risk

This ^^ is the bit I was getting at. I feel declaring de-Ng as a "requirement" to the design effort of DataExplorer is misguided. I feel also, and more importantly, that it is valid and important to indicate the org-level commitment above, and how that agreement (de-Ng) has side-effect impacts on the ultimate design.

TLDR; It's ok to "And also this accomplishes an Org goal to remove security risk of Angular library". It's not ok to say "This, because Angular"

kgcreative commented 1 year ago

Will the core concept of the data explorer build around index patterns still?

Orthogonal to this, I would like to remove the dependency around index patterns. I think there's some work here, but I'd like to rethink index pattern as a concept. An index pattern really is a dynamic data schema, or an aggregation schema. It also gives us some advance features like run-time fields, scripted fields, etc. At the end of the day, we should invest in the ability to compose a query using other query languages, query indexes directly (and potentially be able to simply read the index mapping API to provide run-time schemas, which could be saved as a saved object for ease of exploration down the road). This will also give us more flexibility to correlate alerts, detectors, and other features from across other plugins.

opensearch-project / OpenSearch-Dashboards