opendistro-for-elasticsearch / sql

🔍 Open Distro SQL Plugin
https://opendistro.github.io/for-elasticsearch/features/SQL%20Support.html
Apache License 2.0
621 stars 186 forks source link

Proposal: extracting all fields names used in a query #996

Open FreCap opened 3 years ago

FreCap commented 3 years ago

Hi Devs!

in one of my use cases I need to parse the query to make sure a user is not accessing to fields he/she is not supposed to.

While the paid version has some per field ACL feature integrated, my understanding the opensource one is lacking of this feature.

For this reason I would like to implement an endpoing that can extract this information: given a query SELECT SUM(field1.subA), field3 WHERE field2.subA GROUP BY field3

I would like to return:

POST _opendistro/_sql/_fields
...query body...

result:
{
"select": ["field1.subA", "field3"],
"where": ["field2.subA"],
"groupBy":["field3"]
}

Alternative explored

The first thing I considered was simply running a query _opendistro/_sql/_fields. The challenge with this is that the more complex the query becomes, and the more painless code there is, the more complex-to-impossible the challenge becomes.

Conclusion

Before starting writing anything, I'd be glad to have a quick discussion if this feature might be accepted in the main branch because useful for many other devs or not.

Thank you for the feedback

penghuo commented 3 years ago

Hi,

Have you checked the https://opendistro.github.io/for-elasticsearch-docs/docs/security/access-control/field-level-security/, the ODFE release already supported field level security.

FreCap commented 3 years ago

Hi @penghuo,

The system is for sure many features but it has several challenges when trying to use and extend.

It abstract away and compounds authentication, authorization and masking into one component. This means that:

  1. authentication has to be replicated, and that's ok most times
  2. implementing complex authorization logic can be very challenging (have to change data model and store all permissions in ES, which in large systems is extensive and complex to sync)
  3. masking has limitations, e.g. you cannot mask fields are used in document level security. This means that any user can by default see all of the permission of all other users.
    Document-level security relies on Elasticsearch queries, which means that all fields in the query must be visible in order for it to work properly. If you use field-level security in conjunction with document-level security, make sure you don’t restrict access to the fields that document-level security uses.

Hence it for sure covers a lot of use cases, and it is great for Kibana users requiring security.

It would be nice having something that has components a little less decoupled too.

The feature I'm proposing it doesn't have to be coupled with the permissioning concept since it would be just an API representing the parsing tree.

E.g. /_explain returns the parsing tree encoded ES format /_explainTree returns the query tree before being converted to ES format.

E.g.

SELECT COUNT(*), object.field1 FROM index123 WHERE object.field1>123 AND object.field2<123 GROUP BY object.field1

var response = {
    "select": [
        {
            "type": "function",
            "functionType": "COUNT",
            "arguments": [
                {
                    "type": "referenceAllFields",
                    "value": "*"
                }
            ]
        },
        {
            "type": "fieldReference",
            "table": "index123",
            "fieldName": "object.field1"
        }
    ],
    "where": [
        {
            "type": "function",
            "functionType": "AND",
            "arguments": [
                {
                    "type": "function",
                    "functionType": ">",
                    "arguments": [
                        {
                            "type": "fieldReference",
                            "table": "index123",
                            "fieldName": "object.field1"
                        },
                        {
                            "type": "value",
                            "value": "123"
                        }
                    ]
                },
                 {
                    "type": "function",
                    "functionType": "<",
                    "arguments": [
                        {
                            "type": "fieldReference",
                            "table": "index123",
                            "fieldName": "object.field2"
                        },
                        {
                            "type": "value",
                            "value": "123"
                        }
                    ]
                }

            ]
        }
    ],
    "from": [
        {
            "type": "index",
            "name": "index123"
        }
    ],
    "groupby": [SAME_STYLE]
};

Do you have any advice?