Nested field support in visualizations

waynehamadi commented 3 years ago

Is your feature request related to a problem? Please describe.

As you know, nested support is not available in OpenSearch Dashboards Visualizations. The community craves for it because not everyone is working with unstructured data.

A lot of users work with a relational database as a source of truth they connect to Opensearch. So the data is relational by nature, and unless you can easily denormalize your index, nested objects are the recommended way to model this relationship along with joins (in some cases joins are better to easily update a relationship, or when the number of children for a given parent is really high).

So there is a strong need for nested field support in OpenSearch Dashboard in order to visualize this relational data.

Describe the solution you'd like

Being able to build nested queries and nested aggregations in the visualization dashboard.

A clear and concise description of what you want to happen.

We might want to learn from others that have tried to build the feature in the past, like ppadovani, filirom1 or SpaceManiac. ppadovani suggested a design then implemented something like this as a plugin along with a language called KNQL.

This is a huge feature, so we could do it step by step instead of looking for the perfect solution :

make ppadovani's plugin compatible with openSearch dashboard. Then identify the elements that make it hard to integrate it to the visualizations and try to find a fix for it.
get feedback. To get even more feedback and make the plugin easily usable by all users, we could add a small checkbox(really hidden) on the settings to allow for it to be displayed in the visualizations, without having to do the steps to install the plugin. But users would need to be aware that it can break their dashboards if they use it.

I know it's not a really clear trajectory, but I can't really say where the feedbacks of the community are going to lead us to. It looks like there are 3 options to tackle this problem. The interface can convert user interactions (clicks, drag and drop, typing) into : 1- an extended version of KQL. Currently, KQL is only used to filter data, and has no role in aggregating it 2- a separate language similar to KNQL. This language would be sent to the node web server which converts it back to a JSON sent to Elasticsearch. 3- json query directly, without using any language as a proxy.

Option 1 seems the ideal solution, because we don't create a separate language just for nested support. But the more pragmatic way of doing it might just be to have 2 different languages and THEN find a way to unify them into KQL.

Describe alternatives you've considered

I don't think there are alternatives to supporting nested fields natively on open search dashboard, but to cope waiting for it, vega visualizations allow to build pretty much anything, if you are willing to spend the time and effort.

Additional context This is the history of that feature request in Kibana : Request in 2014 : PR 1 by filirom1 : PR 2 by ppadovani PR 3 by ppadovani [Plugin] (https://ppadovani.github.io/knql_plugin/overview/) by ppadovani This request is considered in Kibana as high hanging fruit (I agree) and low impact (I disagree).

I will leave you with this screenshot which summarizes what you can get with the plugin right now : nested visualizations

ahopp commented 3 years ago

@MerwaneHAMADI thanks for opening this issue! I think the path you laid out (i.e. support plugin compatibility first as a proof of concept and iterate from there) seems like the correct approach. I'll tag as "help wanted".

@ananzh I believe you mentioned that nested field support "Phase 1" was released in Kibana 7.6.0 but it's not enabled. Is that correct? Given we are forked 7.10.2, the assumption would be in the OpenSearch code but we need to make figure our what we'd need to do to enable it? Thoughts?

waynehamadi commented 3 years ago

@ahopp You are correct. It looks KQL supports nested queries right now. If you add a simple nested object like this :

PUT document/_doc/3
{
  "created_at": "2015-01-24",
  "object_content": [
    {
      "string_name": "document1",
      "string_type": "type1"
    },
    {
      "string_name": "document2",
      "string_type": "type2"
    }
  ]
}

We can see the KQL queries it properly : 0 result as expected Screen Shot 2021-07-23 at 6 17 56 PM

1 result as expected Screen Shot 2021-07-23 at 6 17 37 PM

So that part is done. I realize I should precise what CAN be done and what can't be done right now.

I am coming back to you with a clear example. I think I am going to build the same example as in the KNQL (maybe add some items inside the room table, so we get 2 levels of nesting). This way we can all use it as a reference.

Screen Shot 2021-07-23 at 6 23 32 PM

waynehamadi commented 3 years ago

Also quick feedback for anyone starting to contribute : 1- I thought of using docker for opensearch and git clone only dashboard, but the docker-compose.yml here has SSL implemented so unless you want to deal with authorization issues, you're better off just git clone opensearch as well. Normally, if you have the correct JAVA SDK, it should be straight forward. @ahopp did you consider having a docker-compose.yml without SSL ? I tried to play with the configurations, but no way to disable SSL. 2- opensearch dashboard install should be straight forward if you just git clone (don't forget to fork before, if you want to contribute). Just follow the instructions on github. Questions : 1- Anyone managed to connect intellij's node debugger to opensearch dashboard ? It doesn't seem to work for some reason. it's pretty handy to explore the code. 2- If you have any documentation or advice on how to start integrating the KNQL plugin into dashboard, I would really appreciate it. This is what the repo looks like : https://github.com/ppadovani/KibanaNestedSupportPlugin 3- When running opensearch with gradle and quitting gradle, my data is destroyed. I know this is not happening when using docker. Am I missing something ?

ppadovani commented 3 years ago

Hey folks this was my plug-in, and I'm happy to see some interest in trying to push this forward.

Let me provide a list of what this plugin was capable of doing at the time I no longer had time to develop it.

Full support for most visualizations and aggregations
Knql provided a basic query language for building nested queries. There were limitations around more complex query types that were not supported like spans.
filtering etc support in discovery app

This plugin achieved this by modifying the index pattern stored by kibana to get the nested information into the internal config. It then overrode certain pieces of code in order to properly build out aggregations when a field referenced was nested.

One of the reasons I stopped developing this code was due to the shift in UI tech kibana was moving to. I'm not a UI dev, and lack the experience in what needed to be done to achieve the needed results. I also seem to recall that the kibana team began to limit what and how some pieces of code could be overridden which impeded my ability to inject the nested information that was needed.

I'd be more than happy to answer questions as best I can around what I did and how it worked.

Oh and I had long discussions with the elastic guys about the KQL changes they made to support nested fields. I didn't agree with their approach because they require using brackets in conjunction with knowing the underlying nested structure. My viewpoint is that scoping an expression with simple () and allowing the underlying code to use nested expressions based on the schema was a better approach. Which is what KNQL does.

Edit: I now also recall that they decided to put the query parsing in elasticsearch itself rather than kibana, which made actually supporting aggregations impossible since kibana still didn't know what fields were nested.

waynehamadi commented 3 years ago

@ppadovani I am glad to see you here ! Thanks for your work on KNQL. I never got to use it, because our ES version is 7.7, but I read all the github history and I think it's going in the right direction.

You wrote in your blog :

In general you will never notice that nested queries are being generated, as this is done for you in the query parser.

I like your syntax for that reason. Users don't need to know if it's nested or not. In the KQL example above that's not the case :

object_content:{string_name: document1 and string_type: type2}

I have some questions :

1- What do you think is the most stable version of your plugin ? So we can test it ? Edit : also a link to a docker-compose.yml of the elk installation would be great, to make installation easier.

2- How would you go about starting to put your plugin into opensearch dashboard and try and make it work ?

Edit: I now also recall that they decided to put the query parsing in elasticsearch itself rather than kibana, which made actually supporting aggregations impossible since kibana still didn't know what fields were nested.

3- Are you saying that opensearch receives KQL and parses it there ? I thought OS dashboard was parsing KQL into JSON and then sent it to opensearch.

waynehamadi commented 3 years ago

I just installed Elasticsearch and Kibana 6.4.3, ran (edit : I meant 6.4.2) bin/kibana-plugin install https://github.com/ppadovani/KibanaNestedSupportPlugin/releases/download/6.4.2-1.0.2/nested-fields-support-6.4.2-1.0.0.zip And it looks like I can't enable the plugin. Screen Shot 2021-07-24 at 10 47 38 AM

I am on chrome, mac OS. This guy had the same issue : https://github.com/ppadovani/KibanaNestedSupportPlugin/issues/106

So I am going to try version 5.6.11

ppadovani commented 3 years ago

1- What do you think is the most stable version of your plugin ? So we can test it ? Edit : also a link to a docker-compose.yml of the elk installation would be great, to make installation easier.

The latest released version was the most stable. As for a docker compose file, I'll have to go see if I have one kicking around.

2- How would you go about starting to put your plugin into opensearch dashboard and try and make it work ?

I would start with the visualize and aggregation part before I worry about the query language as this provides the foundation that the query language can build on top of. This was from 3 years ago, so I'm dredging through my memories, but in general the following will need to be done in the base dashboard code:

change, if not already done, the index pattern loading code to also pull the nested information from the mapping and store it in a consumable fashion. Most of this you should be able to steal right out of my plug-in code.
the code for building the aggregations was centralized and used by most of the built in visualizations. Update this code to leverage the knowledge about nested fields from the first piece above. Again reference my plugin code, unless the base code has changed a lot, for the changes I injected into kibana.
the UI for the visualizations only really needs help when dealing with a reverse nested agg. I had a way of doing this that should still be ok, but you should think about the usability of this a bit more.

Once you have this done, then I would look at the query language and decide if it's worth doing since kql does exist. Putting KNQL in will add an additional language the end users will have to know.(granted it's pretty simple syntax)

Edit: I now also recall that they decided to put the query parsing in elasticsearch itself rather than kibana, which made actually supporting aggregations impossible since kibana still didn't know what fields were nested.
3- Are you saying that opensearch receives KQL and parses it there ? I thought OS dashboard was parsing KQL into JSON and then sent it to opensearch.

It's been 3 years, so I may not have that part right.

You know that you are forcing me to go back and look at the code and get involved again, right? 😉😂

waynehamadi commented 3 years ago

@ppadovani Haha ! Well I am not going to refuse some help, particularly because you probably know this problem better than me! But I also think you've already given a lot !

I realized it's just easier to download elasticsearch and kibana from the official website, without docker-compose.

As I said before, I got unlucky with the last release : version 6.4.2, but you just said 6.4.2 was the most stable so I am going to try the version just before that : 6.4.1

ppadovani commented 3 years ago

I can't look at this today due to family things. I should be able to pull my code and get it running again tomorrow morning.

waynehamadi commented 3 years ago

Thanks ! By the way I replaced this install command

bin/kibana-plugin install https://github.com/ppadovani/KibanaNestedSupportPlugin/releases/download/6.4.2-1.0.2/nested-fields-support-6.4.2-1.0.0.zip By this : kibana-6.4.2-darwin-x86_64/bin/kibana-plugin install https://github.com/ppadovani/KibanaNestedSupportPlugin/releases/download/6.4.2-1.0.2/nested-fields-support-6.4.2-1.0.2.zip

But still nothing. Looking at 6.4.1

Edit : it's working now in 6.4.2 I just forgot to read this : Adding support to an indedPattern requires that the indexPattern be defined first using the normal Kibana management application.

So adding an index pattern will allow the checkbox to appear. Which makes sense, because nested support is activated per index pattern.

waynehamadi commented 3 years ago

tldr;

Opensearch dashboard doesn't even recognize include_in_root in visualizations anymore. Somehow it got removed between 6.4.2 and 7.10.2
as @ppadovani said, the first step is to change the index pattern loading code to pull the nested information.

These are the simple steps to understand the feature I think opensearch dashboard should support : Let's say I have houses. Each house can have multiple cars, and each car has multiple options I want Opensearch Dashboard to : Give me the number of cars grouped by their brand. For each brand, show me the options they have.

1- Install elasticsearch 6.4.2 and kibana 6.4.2 => all you have to do is run bin/elasticsearch and bin/kibana. 2- Check that it's working normally => go to localhost:5601 3- install Ppadovani's plugin => /bin/kibana-plugin install https://github.com/ppadovani/KibanaNestedSupportPlugin/releases/download/6.4.2-1.0.2/nested-fields-support-6.4.2-1.0.2.zip 4- Follow the steps in this video. 5- Copy paste json requests in this gist 6- run them in dev tools 7- create an index pattern on houses 8- create a pie chart and slice by cars.make and then cars.options.name I introduced an option called Acura Special. This option is only possible for Acura cars. When using include_in_root, it shows that this option is on all the car models, which is not what I want 9- Click on nested field support in index management 10- go back to the aggregation => TADA ! now it's showing that the option is only on the Acura model. 11- Now install opensearch and opensearch dashboard and run the seeds from the gist 12- Same thing, create an index pattern and create a pie chart. In this case, you CAN'T even see the include_in_root fields.

Everything is detailed in this video : https://www.youtube.com/watch?v=c0bOQgORmn8

Common Objection I have heard about this feature :

In this example, we can clearly see the optimal index should be centered around cars, so that's not the correct way to index. To answer this I would say : I don't want to reindex, just because my question is slightly different. And what if i change my mind and my question actually becomes : Give me the number of houses grouped by their type of fence. For each of these fences, give me the number of cars grouped by their brands. For each brand, show me the options they have => this would require an index centered around houses.

Anyway, I hope everyone is convinced that we need that, it's not only a nice to have.

Plus, when you know that a lot of e-commerce use elasticsearch, you will understand that if you replace house by customer, car by orders and options by product, you have a perfectly valid business case. AND you can do full text search on top of that.

ppadovani commented 3 years ago

Pulled the main of this repo, and started to look things over.. everything as been refactored from three years go, right?!? LOL

Anyway, the file index_pattern_field.ts is the starting point that will need to be worked with. Go backward from there.. if I have time I might work on a branch.

waynehamadi commented 3 years ago

Good luck with all these changes @ppadovani Don't hesitate to push even a PR where the plugin is setup and it does a simple hello world. because I am not familiar with the way plugins interact with Opensearch dashboard or Kibana, so having a boilerplate is really valuable.

ppadovani commented 3 years ago

The PR is going to have to be against existing code... we need to have the nested path stored as part of the index pattern field. Once that is in place and functional, then the remaining code around aggregations can be modified to take advatage of it.

Edit: Looks like the nested path info is loaded in the index pattern. So the only changes that need to happen are related to building aggregations. At first glance it doesn't look like that code has changed much since my plugin. I have a full plate, so I'll get to this when I have time. But if someone wants to take a stab at it, create a branch and start comparing:

My plugin: public/nested_support/vis/agg_configs.js

opensearch_dashboards: src/plugins/data/common/search/aggs/agg_configs.ts

Once those changes are in place, then comparing the agg_types and agg_response content in my plugin against the equivalent files and merging those changes should start to yield working visualizations.

The most complex pieces of my plugin were related to injecting the support for nested fields/paths into the index mapping, and the query language, so getting the aggs working shouldn't be too heavy a lift.

waynehamadi commented 3 years ago

Thanks for this information. That's great news, we have less work to do. I first added the nested fields to the visualizations in a PR against the forked project, because it wasn't possible to see them. This is the result I have so far. As you can see, the acura special option is under all car brands, which shows that the aggregation is wrong right now. By the way the only reason I have a result is because I set included_in_root to true for cars and cars.options.

Now I am working on agg_configs. It looks like most of the changes are inside the toDsl function. I am trying to run kibana 6.4.2 dev version so I can compare with what's returned compared to osd. I have a bazel issue though. Edit : nevermind, I was on the wrong branch (should be v6.4.2, not 6.4)

Screen Shot 2021-07-31 at 2 35 28 PM

waynehamadi commented 3 years ago

@ppadovani It's not so obvious how to integrate the code to the toDSL method. For now I am going to create from scratch just to understand a little better how the code works, but if you have a working version, don't hesitate to share it.

waynehamadi commented 3 years ago

So in this branch I copied your agg_config and agg_configs files and converted them to typescript.

the value of dslTopLvl just before the return statement in toDsl (in agg_configs.ts) is :

{
    "nested_2": {
        "nested": {
            "path": "cars"
        },
        "aggs": {
            "2": {
                "terms": {
                    "field": "cars.make",
                    "order": {
                        "_count": "desc"
                    },
                    "size": 5
                },
                "aggs": {
                    "nested_3": {
                        "nested": {
                            "path": "cars.options"
                        },
                        "aggs": {
                            "3": {
                                "terms": {
                                    "field": "cars.options.name",
                                    "order": {
                                        "_count": "desc"
                                    },
                                    "size": 10
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

So that looks ok, because this is exactly the aggregation I asked for. But somehow elasticsearch fails to load response data. I think there is a missing piece. Screen Shot 2021-08-01 at 5 16 18 PM

Edit : I fixed it. Now Elasticsearch is actually sending me the correct response :

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 15,
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "nested_2": {
            "2": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "Lexus",
                        "doc_count": 13,
                        "nested_3": {
                            "3": {
                                "doc_count_error_upper_bound": 0,
                                "sum_other_doc_count": 0,
                                "buckets": [
                                    {
                                        "key": "AllWeatherMats",
                                        "doc_count": 8
                                    },
                                    {
                                        "key": "Entertainment",
                                        "doc_count": 5
                                    },
                                    {
                                        "key": "V6",
                                        "doc_count": 5
                                    },
                                    {
                                        "key": "Leather",
                                        "doc_count": 4
                                    },
                                    {
                                        "key": "AlloyWheels",
                                        "doc_count": 2
                                    }
                                ]
                            },
                            "doc_count": 24
                        }
                    },
                    {
                        "key": "Acura",
                        "doc_count": 11,
                        "nested_3": {
                            "3": {
                                "doc_count_error_upper_bound": 0,
                                "sum_other_doc_count": 0,
                                "buckets": [
                                    {
                                        "key": "Leather",
                                        "doc_count": 5
                                    },
                                    {
                                        "key": "V6",
                                        "doc_count": 5
                                    },
                                    {
                                        "key": "Entertainment",
                                        "doc_count": 4
                                    },
                                    {
                                        "key": "AllWeatherMats",
                                        "doc_count": 3
                                    },
                                    {
                                        "key": "AlloyWheels",
                                        "doc_count": 3
                                    },
                                    {
                                        "key": "Acura Special",  ONLY ACURA SHOULD HAVE THIS OPTION, the nested aggregation works.
                                        "doc_count": 1
                                    }
                                ]
                            },
                            "doc_count": 21
                        }
                    },
                    {
                        "key": "Honda",
                        "doc_count": 7,
                        "nested_3": {
                            "3": {
                                "doc_count_error_upper_bound": 0,
                                "sum_other_doc_count": 0,
                                "buckets": [
                                    {
                                        "key": "V6",
                                        "doc_count": 6
                                    },
                                    {
                                        "key": "Entertainment",
                                        "doc_count": 4
                                    },
                                    {
                                        "key": "Leather",
                                        "doc_count": 4
                                    },
                                    {
                                        "key": "AlloyWheels",
                                        "doc_count": 2
                                    },
                                    {
                                        "key": "AllWeatherMats",
                                        "doc_count": 1
                                    }
                                ]
                            },
                            "doc_count": 17
                        }
                    }
                ]
            },
            "doc_count": 31
        }
    }
}

But I still don't have any visualization, so I am looking at agg_response and agg_types as you suggested.

ppadovani commented 3 years ago

This looks good so far. Sorry I haven't been able to do much, life gets in the way.

What you want to do is look at the agg_response code in my repo and compare against the existing implementation. This is the code that handles the response from Elastic and formats it correctly. Pay attention to the two sub dirs.. tabify is what's used to create the table of data you can see on visualizations.

waynehamadi commented 3 years ago

Thanks ! No worries ! I have a question : What's the goal of agg_response/tabify/_buckets.js ? It looks like it's the same code.

ppadovani commented 3 years ago

I believe that was a clone in order to get around a weird access issue I had. So I cloned and owned it.

waynehamadi commented 3 years ago

ok thanks for the information. I am on build_hierarchical_data and it really looks different.

It looks like the most important code of build hierarchical data is here :

const aggData = resp.aggregations[firstAgg.id] || resp.aggregations['nested_' + firstAgg.id][firstAgg.id];
const split = buildSplit(agg, metric, bucket[agg.id] || resp.aggregations['nested_' + firstAgg.id][firstAgg.id]);

but the buildSplit function disappeared... At any point, you can see where I am at here : https://github.com/MerwaneHAMADI/OpenSearch-Dashboards/pull/1/commits

So don't hesitate if you see something incorrect

ppadovani commented 3 years ago

So kibana named the aggs based on the order they were defined as numbers. What I did was prefix nested_ to those levels that were, well, nested in order to pull them back out on the response correctly.

I'll try and grab some time first thing in the morning and look at your branch.

waynehamadi commented 3 years ago

I just realized what's wrong : the ui folder got removed : https://github.com/elastic/kibana/tree/6.8/src/ui => in 6.8 it was here in 7 it's gone : https://github.com/elastic/kibana/tree/7.0/src/ui

ppadovani commented 3 years ago

Looks like it got moved to here: https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/plugins/data/common/search/aggs

All of the code that I overrode I used the same file names... so searching for date_histogram allowed me to locate the code that needs to be modified.

waynehamadi commented 3 years ago

const aggData = resp.aggregations[firstAgg.id] || resp.aggregations['nested_' + firstAgg.id][firstAgg.id];
const split = buildSplit(agg, metric, bucket[agg.id] || resp.aggregations['nested_' + firstAgg.id][firstAgg.id]);

For these 2 lines, it's not so obvious where to attach them in build_hierarchichal_data.ts :

resp is not used as a variable name anymore, neither is aggData.
I am looking for buildSplit accross the whole app, and it's only mentioned in the legacy plugins.

I am debugging build_hierarchichal_data to see where to stick prefix with '_nested'.

waynehamadi commented 3 years ago

@ppadovani I still didn't add build_hierarchichal_data, but it's already working with tabifybuckets I managed to get a chart, and I checked that this is correct.

So what is the use case that build_hierarchical_data enables ?

@ahopp is there some kind of documentation around building plugins on osd ? I have tried this boilerplate : https://github.com/spalger/kibana-plugin-boilerplate and replaced kibana by osd. Doesn't seem to console log hello world.

waynehamadi commented 3 years ago

Are hacks still possible on opensearch dashboards ? Because it looks like that's what we need to build the plugin.

Is there some kind of boilerplate working on opensearch dashboard where a hack is implemented and where a method is overwritten ?

waynehamadi commented 3 years ago

to give more context, the plugin for nested support is defined like this :

return new kibana.Plugin({
    require: ['elasticsearch'],
    name: 'nested-fields-support',
    uiExports: {

      docViews: ['plugins/nested-fields-support/nested_support/doc_view/structure'],

      managementSections: [
        'plugins/nested-fields-support/index_pattern/management',
        'plugins/nested-fields-support/discover/management'
      ],

      hacks: [
        'plugins/nested-fields-support/nested_support'
      ]

    },

    config(Joi) {
      return Joi.object({
        enabled: Joi.boolean().default(true),
        index: Joi.string().default('.kibana')
      }).default();
    },

    // Update the .kibana index-pattern type to include a new nested flag
    init(server, options) {
      const { callWithInternalUser } = server.plugins.elasticsearch.getCluster('admin');

      updateIndexSchema(callWithInternalUser, server);

      server.route({
        path: '/api/nested-fields-support/mappings/{name}',
        method: 'GET',
        handler(req, reply) {
          const { callWithRequest } = server.plugins.elasticsearch.getCluster('admin');
          callWithRequest(req, 'indices.getMapping', {
            index: req.params.name
          }).then(function (response) {
            reply(response);
          });
        }
      });
    }

  });

But I can't find ui_exports or hacks on opensearch dashboard. has it been replaced by something else ?

I tried to look in this repo for plugins that hack opensearch, but it looks like they all ADD methods, and don't replace ones. Do you know any opensearch dashboard plugin that actually overwrites methods ?

ppadovani commented 3 years ago

You don't need to do any of this I believe. This code was all about turning on nested fields in an index pattern and loading the nested paths. This is already done in the base kibana code. We just need to make sure the visualizations/aggregations support detecting and handling nested fields.

In terms of hacking via plugins, I think that is the wrong approach. The support for nested aggregations must be baked into the main code itself and not hacked in with a plugin. This aligns with the nested path support in the index pattern.

Support for nested fileds in Discovery will take more effort and UI work as representing a nested structure, given the existing layout, is challenging.

waynehamadi commented 3 years ago

You don't need to do any of this I believe. This code was all about turning on nested fields in an index pattern and loading the nested paths. This is already done in the base kibana code. We just need to make sure the visualizations/aggregations support detecting and handling nested fields.

Ok, good to know.

In terms of hacking via plugins, I think that is the wrong approach. The support for nested aggregations must be baked into the main code itself and not hacked in with a plugin. This aligns with the nested path support in the index pattern.

After reading all the issues you had merging this feature into kibana, I was thinking building a plugin was the correct way. But now, I realize most of the work has already been done : the index pattern already displays all we need. So I agree with you : implementing a plugin that hacks osd is not necessary anymore.

So now the question is what is the minimum viable implementation that aligns with the vision of opensearch dashboard and allows us to get momentum on this issue ? Option 1

We don't merge anything into the code until all visualizations are supported. It might take more time to have a really clean implementation. I have looked a bit at all the visualizations, and I think there are some edge cases that would require some time to explore, before saying they are officially supported. I think this option might be too ambitious Option 2
We choose one visualization (pie chart for example), and we support it fully. This means in the pie chart visualization, users will be able to choose nested fields, but not in the table visualization for example. Then we add visualizations one by one.

@ahopp @stockholmux @stockholmux what do you think ?

I am going to drill down into option 2. If we go for it, this branch is working but this is the work remaining:

nested fields should only be displayed for pie charts.
when you drag and drop slices of a pie chart, this should give a correct result, particularly when it's coupled with filters => cf this comment
when you click on the pie chart, it should filter correctly. => the current behavior on my branch is that it's going to try and filter the element in the root of the document. This means this will only work if you include_in_root your nested objects. Otherwise it will display no results, which is pretty confusing.

For the last item we have 2 options : Option 1 When the user clicks on a pie chart, it creates a nested filter. Option 2 We know the vision is option 1, but in the meantime, we don't allow the user from filtering on the pie chart. TO be clear, we only prevent the user from filtering if he's filtering on a nested field. If the field is "normal", then it should behave as before, otherwise this would be considered a regression. => option 2 is a strict improvement from the current visualizations since pie charts on nested fields are not even possible right now. Option 3 : Same as option 2, but we only prevent the user from filtering on the pie chart IF the field is not included in root (maybe too complex to implement) Option 4 (in an ideal world) Clicking on the pie chart not only creates a nested filter, but also modifies the pie chart aggregation itself and adds a filter. Because a nested filter still wont display the pie chart correctly. When the user clicks on a pie, he wants this pie to become the new whole of the pie chart. => probably too hard for now.

I think it's between option 1 and 2

stockholmux commented 3 years ago

@MerwaneHAMADI I'd like to hear more voices here, but my gut says that iterative is the best approach - so, support the pie chart first. Option 2 on the second question makes sense - if you are running include_in_root you are progressively enhanced, if not you have the same thing - right?

waynehamadi commented 3 years ago

if you are running include_in_root you are progressively enhanced.

Not sure if I made myself clear, so I am going to detail. From what you're saying it sounds like you want option 3.

if we go for option 2 of not allowing nested pie charts to be clickable, then even if you have your object include_in_root, you won't be able to filter against it, because it's just not clickable. But it's a strict improvement, in the sense that right now nobody can create nested pie charts. So if we give them non filterable nested pie charts, it's still an improvement, over no nested pie charts at all. My concerns are the following =>

if some part of the pie chart is non nested and some other is nested, we need to prevent the nested part from being filterable but the other part to still be filterable.

@ppadovani I want to hear what you think on that one. Do we try and create one PR to try and merge every visualization into the code or just start with the pie chart ?

If we only implement pie charts, there are still a lot of things to define on how to communicate this to users, and if we prevent the users from filtering on nested pie charts (and I insist : only nested pie charts, not regular pie charts, otherwise it's a regression), then they need to understand why. This is why I am still doubting between option 1 and 2. It might be easier to just build a nested filter when the user clicks, instead of creating a workaround for it.

waynehamadi commented 3 years ago

Forget what I said above about nested filters on pie chart visualizations : it's actually working. I tested it on fields that are not included in root and it works. This means when someone clicks on a pie chart, it actually selects only the records that match with the criteria in the pie charts, and it's automatically using the filter of the pie chart. Obviously, it's still not smart enough to select the pie chart as a whole, but I don't think this is in the scope of what we're trying to do (cf option 4 above).

This is a pretty good news. I am still testing it on all the fields.

ahopp commented 3 years ago

@MerwaneHAMADI RE: viable implementation, in general my bias is "progress not perfection"! I think the incremental approach would be my preference. It seems to be the best path for feedback and iteration - and learn what works well as what we'd want to change before expanding to all visualizations.

RE: Communicating to users, did you have a preferences? One treatment could be some tag with qualifiers on the CX treatment (ex. a beta tooltip) as we expand the feature. We would also include the changes in the release notes but could qualify the feature scope. There is some nuance to add here but one option.

This is exciting stuff - thanks for driving!

ppadovani commented 3 years ago

Just to be clear... the field support for nested is already baked in. Adding the support to the visualization controls for buckets/metrics and the updates to the agg_config.ts code will enable nested aggregations for more than the pie chart, as I believe the code works this way. So I don't think you can just turn on support for pie charts.

Personally I would add the basic support, and run through some testing to make sure it doesn't break existing aggs/charts/etc. Then I would start collecting issues/features that are broken/missing in the context of nested and start burning that list down.

waynehamadi commented 3 years ago

@ahopp ok, so you also lean towards starting with the pie chart. And I agree with what you said for the UX treatment. Fortunately, the biggest UX hurdle is gone, now that the user can actually filter on nested pie charts.

@ppadovani There is a way to not display nested fields for specific visualizations. in the field.ts react component, you can allow nested fields based on the query parameter ?type=pie (for example). So it's possible to just turn on support just for pie charts, and even only for a specific aggregations.

Personally I would add the basic support, and run through some testing to make sure it doesn't break existing aggs/charts/etc. Then I would start collecting issues/features that are broken/missing in the context of nested and start burning that list down.

Do you mean collecting issues/features from people using it in the master branch ? I am a bit worried about that. I am not sure of the impact of that feature on the user experience. For now I am still not able to find any regression. I can't even reproduce this.

But I don't know what can go wrong with this feature.

This might sound overly cautious, but what I am the most comfortable doing right now is shipping nested pie chart with histograms only . This would allow to have a proof of concept, make real users use it. And histograms with pie charts are not used as much as terms aggregations for example. So if something breaks (which I will make sure it doesn't, because I am testing it extensively), at least we only break one aggregation of a single visualization. Then if we don't find any regression, we can unlock the full pie chart at once.

The logic I suggest we implement is :

if the user is inside the pie chart and the user selected histograms, show him all the nested fields. Otherwise, only show him the regular fields.
I don't think adding this logic will add significantly more latency.

Again, it could just be a matter of weeks to then start adding even more aggregations and visualizations, believe me, I am the first to need it.

waynehamadi commented 3 years ago

This bug happens in the main branch : https://github.com/opensearch-project/OpenSearch-Dashboards/issues/718

It's good to know it now, because we'll know it's not a regression from the nested field support.

ahopp commented 3 years ago

@MerwaneHAMADI Seems like you're making great progress here! Anything we can do to help / support?

EDIT: Didn't even realize you have https://github.com/opensearch-project/OpenSearch-Dashboards/pull/794 already. I'll ping.

nhirshler commented 2 years ago

MerwaneHAMADI What is the current status of nested fields in ELK? I installed elasticsearch and kibana 7.16.2 and I cannot access nested fields in visualizatation. It worked in the older version of elastcsearch and kibana but after this last update is not longer possible to create term aggregation based on nested field.

Jorgen-VikingGod commented 2 years ago

Any update or plan to support nested fields in visualizations?

heygambo commented 2 years ago

I also really need this

Mephilius commented 2 years ago

Hi,

I also really need this urgend. We are a large corporation that plans to use elastic. The visualization of the nested fields in Kibana is a key point that we need.

fatihakafou commented 2 years ago

Hi,

This is also really needed for us. Any update on it ??

zidludwig commented 2 years ago

i was searching for long time how to put my nested objects with logstash into the DB to analyse it with kibana. Now i found the solution: "DON'T DO IT"! i will try to do with workarounds (split events,..) but i'd really love to do it straight forward !

kavilla commented 1 year ago

[Triage]

@seanneumann @ahopp, as we can see this a high ask for the community. Could we get this prioritized?

JuanAntonio03 commented 1 year ago

Hello,

this is something we desperately need in my team. ¿¿ Any update ??

mik3fly commented 1 year ago

Hello, is there any update or roadmap to implement this ? thanks

b-akhil commented 1 year ago

Hello. Is there any update regarding the support for aggregations on nested fields within OpenSearch Dashboards ?

AdaptiveStep commented 10 months ago

Hi! Does anyone know any good workaround to this at least? I'm thinking of splitting the events (denormalizing), but i'd much rather avoid it. Any clues?

Fantus commented 10 months ago

Same here, we would love to have this feature

opensearch-project / OpenSearch-Dashboards

Nested field support in visualizations #657