Closed ananzh closed 3 months ago
@ananzh very nice ! I would add another important capability is to allow the community to contribute generic vis-tool as part of the out of the box vis tools catalog
I strongly recommend reviewing the vega-altair engine used to do this same transformation from a high level language (python) into the vega spec (json)
Another suggestion is to integration the existing opensource vega-editor to replace our existing vega json editor to simplify the actual vega editing for advanced vis- builders
zooming in and out
This exists in the tool today.
Toggle in VisBuilder to allow user to display vislib vis or vega vis in VisBuilder, to save as vislib vis or vega vis and to embed either vislib vis or vega vis in Dashboard .
We should not have a toggle in the UI since for most users Vega is an implementation detail. Only advanced users would care about it. If we want to maintain the expereince for users, we should either try to match the experience or keep an advanced settings toggle to allow the user to go back to the older expereince.
A new vega type vis directly in VisBuilder
Why do we need this as opposed to just redirecting the user to the vega editor? if we do it this way, we should allow the user to switch back and carry context from vega back to the other chart types. Right now if i switch between line and bar and go back to line, the line chart carries over the changes that t can from the bar chart. With this vega type can we do that?
VegaSpecBuilder Class
In this class you are also constructing the query but its very secific to DSL. how would this work with PPL and SQL? They each support a limited subset aggregations and does not support all the agg types.
Supporting Multiple Query Languages (DQL, PPL, SQL)
if we arent integrating VisBuilder into Discover, we might not need this. Would like to hear from the others about this, but my reasoning is that the user never has to enter the query that is used to fetch the data from the backend. If thats the case, the language we use under the hood does not matter. The only exception to this being datasources that dont support visualizations in other languages. In that scenario id like this to be a little more modular so that when other languages are added, its not on the VisType to manually update itself to support all the new languages.
One approach here could be to allow the VisType to specify which languages it supports so that they all have to support DQL by default but can optionally specify which other languages they support. But what would be even nicer is if the VisType did not have to know anything about the language used under the hood and only worried about the dataframe that cameback and mapped it to the Vis, leaving the query language part to the framework. But this might be trickier
A hard code mapping for demo purpose
export const createVegaSpec = (styleState, dimensions, valueAxes, aggConfigs, indexPattern, searchContext) => {
const { addLegend, addTooltip, type } = styleState;
const { x, y } = dimensions;
const index = indexPattern.title;
const timeField = searchContext.timeRange ? searchContext.timeRange.field : "@timestamp"; // Use the time range field or default to "@timestamp"
const dateHistogram = aggConfigs.aggs.find(agg => agg.schema === 'segment');
const metric = aggConfigs.aggs.find(agg => agg.schema === 'metric');
const metricType = metric.type.name;
const dataUrl = {
context: true,
timefield: timeField,
index: index,
body: {
aggs: {
1: {
date_histogram: {
field: dateHistogram.params.field.displayName,
fixed_interval: "3h", // hard coded for now
time_zone: "America/Los_Angeles", // can be dynamic if required
min_doc_count: dateHistogram.params.min_doc_count,
extended_bounds: dateHistogram.params.extended_bounds,
},
aggs: {
2: {
[metricType]: {
field: metric.params.field.displayName
}
}
}
}
},
size: 0
}
};
const vegaSpec = {
$schema: "https://vega.github.io/schema/vega-lite/v5.json",
data: {
url: dataUrl,
format: {
property: "aggregations.1.buckets"
}
},
transform: [
{
calculate: "datum.key",
as: "timestamp"
},
{
calculate: `datum[2].value`,
as: metric.params.field.displayName
}
],
layer: [
{
mark: {
type: "line" // or dynamic type if needed
}
},
{
mark: {
type: "circle",
tooltip: addTooltip
}
}
],
encoding: {
x: {
field: "timestamp",
type: "temporal",
axis: {
title: timeField
}
},
y: {
field: metric.params.field.displayName,
type: "quantitative",
axis: {
title: metric.params.field.displayName
}
},
color: {
datum: metric.params.field.displayName,
type: "nominal"
}
}
};
if (addLegend) {
vegaSpec.encoding.color.legend = {
title: metric.params.field.displayName
};
}
return vegaSpec;
};
Can you speak to the difference of the options? I'm not really sure from reading
From method 1: cons
which might not be flexible for dynamic changes.
Are there specific cases you're worried about?
we should create a series of test cases that cover various combinations of metrics and buckets
Just to be clear, we should have test cases for all known combinations, right? And can we prevent unknown combos from being used in the product in some way?
Also, do we clearly understand the expected input/output of these cases?
VegaSpecBuilder
Should we be storing unserializable state in redux?
Also, building the spec is calculated state, is this the right thing to store?
Create a vega slice
Why do we need a slice? slices are for state that needs to be stored globally and accessed across the app. The Vega spec is only needed by the Visualization right? cant we just create the spec there?
Send modular API to update VegaBuilder Class
Do we need to update both the slice and the aggconfig? or can we update just the aggconfig? My assumption was that the spec could be constructed whenever we want using the style state and the agg config.
Separate buckets Both methods need to separate bucket aggregations into distinct categories: group, split, and segment. This separation is necessary because each type of aggregation serves a different purpose in the visualization:
Can you give a little more details about this. Not sure i fully understood why we need this.
VegaSpecBuilder
How does this work for different Vistypes? dont the encodings and specs change between vistypes? e.g. pie and bar chart will encode the chart differently right?
const vegaSpecBuilder = useTypedSelector(state => state.vega.specBuilder);
State should not be used to retrieve a function. Why cant vegaSpecBuilder
be a simple function?
The Difference
In this section i didnt understand the difference between the two methods. What is method 2? I didnt understand the pro's and cons of each approach to know which ones better. An example might help.'
Overall, the approach here could benifit from a block diagram explaining how the flow works as the information is passed across the various components
| if we arent integrating VisBuilder into Discover
How will sql/ ppl users build visualization?
How will discover IA for visualizations be handled with multiple languages support ?
How will we achieve the cohesion tenet without sql / ppl support for visualizations
Background
The primary problem we are addressing is the need for more advanced and customizable data visualization capabilities in OpenSearch Dashboards. While VisBuilder reached General Availability (GA) in version 2.15, it is currently limited to a few chart types and lacks the comprehensive set of controls necessary for complex visualizations. Enhancing VisBuilder to incorporate more complex controls will provide users with powerful tools for data analysis and reporting, thereby improving the overall user experience and functionality of OpenSearch Dashboards. Additionally, from a technical perspective, we aim to streamline the visualization process by consolidating the multiple existing libraries (such as timeline, vislib, and vega) into a single, cohesive library. This unification will simplify the development and maintenance of visualizations, ensuring consistency and ease of use for developers and users alike.
Requirements and Considerations
Requirements
Technical Requirements:
Non-Technical Requirements:
Considerations and Optimizations
Optimizations:
Non-Prioritized Aspects:
Out of Scope
Current Workflow
VisLib in VisBuilder Workflow
Vega Vis Workflow
Proposed Design
Key Deliveries for 2.16
Note: This is not a complete version. It is just for demo purpose.
https://github.com/opensearch-project/OpenSearch-Dashboards/assets/79961084/c93519b8-4eb7-437b-b19a-c6f710faeffd
1. Vega Integration in VisBuilder
2. Advanced setting to allow user to use vega to create visualizations in VisBuilder
This includes modifications in VisBuilder for each chart type to use either visualization expression or vega expression. The main purpose is to avoid any breaks for user experience. New controls will only be added in vega vis.
3.Easy migration from VisLib visualization created by VB to vega vis. Allow embed both visualizations in Dashboard .
Allow save vislib vis or vega vis: the only difference in the url is
useVegaRendering
value in style state which will decide whether use visualization expression or vega expression. whenuseVegaRendering
is true, render vega in VisBuilder with toggle turned on.Same as embedded to Dashboard: when saved with
useVegaRendering
to true, embed vega vis in Dashboard4.More controls to line chart.(Optional) Use line chart as an example to integrate all the controls from line visualization to Vis-Builder line vega chart. Optional: add 1-2 new controls
Implementation Details regarding the VegaSpecBuilder Class
Method 1: Passing the Whole Aggregation (Aggs) as Input
Key Differences from Static Vega Spec Input:
1. Extend the Visualization Slice for context integration
We'll extend the existing visualization slice to include Vega-specific state and actions:
2. Data Retrieval with proper aggregations: Utilize opensearchaggs to retrive aggs directly
Update the opensearchaggs function to return the constructed aggregations:
3. Data Transformation
Data transform in vega is done by
transform
. What it does is similar totabifyAggResponse
, which is aim to flatten nested structures for visualization. The main difference in approach is thattabifyAggResponse
creates a complete tabular representation of the data, while the Vega transform provides a series of steps to transform the data on-the-fly during visualization rendering. This makes the Vega approach more memory-efficient and potentially faster for large datasets, as it doesn't need to materialize the entire flattened dataset in memory. Here is more comparation:tabifyAggResponse
produces a tabular format with rows and columns, while the Vega transform creates a series of steps to transform the data within Vega.tabifyAggResponse
uses numeric IDs (e.g., 2-1) for column names, while the Vega transform uses more descriptive names based on the aggregation structure.tabifyAggResponse
creates separate rows for each bucket combination, while the Vega transform uses flatten operations to handle nested buckets.Here we will add two utility functions
parseAggStructure
: This function can recursively parses the aggregation structure to create a simplified representation.generateTransform
: This function generates the Vega transform steps based on the parsed aggregationUse these functions in the Vega utility functions in the next sub-section:
Example Result: Given the following aggregation:
The datum structure would be:
The generated transform would be:
4. Create Vega Utility Functions
Create utility functions in a separate file:
5. Update toExpression Method
Modify the toExpression method to use the new utility functions:
Method 2: Construct Aggs
Method 2 follows a similar structure to Method 1, but instead of passing the whole aggregation, it constructs the aggregation from individual components (metrics, segment, group, split). The main difference lies in the setVegaAggs reducer and the buildVegaSpec utility function:
Method 3: Passing Formatted Data to Vega Spec
This method involves passing pre-formatted data directly to the Vega spec. This method requires modifications to the buildVegaSpec function:
3. Pros and Cons
Method 1: Passing Whole Aggregation
Method 2: Construct Aggs
Method 3: Passing Formatted Data
Conclusion
After considering all three methods, we decide proceeding with Method 1: Passing Whole Aggregation. This approach offers the best balance between maintaining consistency with existing OpenSearch Dashboards structures and providing efficient handling of complex aggregations. It avoids the potential scalability and performance issues of Method 3 while being less complex to implement and maintain than Method 2. Method 1 aligns well with the current OpenSearch Dashboards architecture and will likely provide the smoothest integration path for Vega visualizations within the existing framework. It also leaves room for future optimizations and extensions if needed.
How to Test / How to Make the Transfer Robust
To ensure the robustness and accuracy of the VegaSpecBuilder implementation, we should create a series of test cases that cover various combinations of metrics and buckets. These test cases will help verify that the VegaSpecBuilder can correctly handle different visualization configurations.
Test Cases
1 Metric 1 Bucket:
2 Metrics 1 Bucket:
1 Metric 3 Buckets:
1 Metric 4 Buckets:
2 Metrics 4 Buckets:
Future Extension Discussion
Supporting Multiple Query Languages (DQL, PPL, SQL)
Extend the VegaSpecBuilder to handle different query languages:
Handling Multiple Queries and Data Sources
Handle multiple queries and data sources by extending the buildVegaSpec method:
2.16 Timeline and Task BreakDowns
FAQ