[RFC] Aligning Access and Visibility in OpenSearch

peternied commented 7 months ago

Problem Statement

When a OpenSearch cluster admin Alice creates and then shares a Quarterly Sales dashboard with user Bill; Alice does not know if Bill will see the same data.

How can this be?

In the Security Plugin there are several permissions rules that are computed based on the user sending the query. These alter the query results and are invisible to the user Bill.

1) Permissions for indexes are additive, to read an index a user need to have a role that grants them permission to that index. Sales data could be in indices named sales-na-YYYY-MM & sales-eu-YYYY-MM, Bill might only have access to the EU region. 2) Document level security features (incl. row, columns, field-masking) are additive, if a user has a role with these features enabled it adds a restriction on the data. Sensitive sale data is filtered out by a DLS rule, customer-name == Frodo Baggins 3) Security supports operating in a mode where queries are edited in flight to remove indexes the user does not have permissions to access. Bill does not have access to the NA region sales data and those results are silently filtered out

Bill's view of the dashboard would be missing data and he would not be unaware. Even if he brought this to Alice's attention she does not have a straight forward way to know what Bill data is missing nor if he needs more permissions to sales-na-* or less permissions so the sensitive sales data filter is not applied.

flowchart TB
    A["Quarterly Sales Dashboard Query: sales-*"] -->B1["Alice: Expand Query Indices<br/>sales-na-jan, sales-na-feb, sales-eu-jan, sales-eu-feb"]
    A -->B2["Bill: Expand Query Indices<br/>sales-na-jan, sales-na-feb, sales-eu-jan, sales-eu-feb"]

    subgraph Alice ["Alice's Flow"]
    B1 --> C1{"Resolve User Permissions"}
    C1 -->|No Filters Applied| E1["Filtered Indices:<br/>sales-na-jan, sales-na-feb, sales-eu-jan, sales-eu-feb"]
    E1 --> F1{Apply DLS Rules}
    F1 -->|No DLS Rules Applied| G1["Final Query:<br/>sales-na-jan, sales-na-feb, sales-eu-jan, sales-eu-feb"]
    G1 --> H1["Run Query"]
    end

    subgraph Bill ["Bill's Flow"]
    B2 --> C2{"Resolve User Permissions"}
    C2 -->|Filters Applied| E2["Filtered Indices:<br/>sales-eu-jan, sales-eu-feb"]
    E2 --> F2{Apply DLS Rules}
    F2 -->|Exclude 'Frodo Baggins'| G2["Final Query:<br/>sales-eu-jan, sales-eu-feb Exclude 'Frodo Baggins'"]
    G2 --> H2["Run Query"]
    end

    H1 --> I["All Indices Data"]
    H2 --> I

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#bff,stroke:#333,stroke-width:2px

Proposal

There should be a resource that combines these different access control rules together so that if access is grant to that resource Alice has confidence that Bill sees exactly the same view of the data. If Bill does not have access, there is a clear path to resolve the access question - "Alice could you grant me access to 'Company wide sales data' resource so I can see the Quarterly Sales dashboard?"

This resource would be fundamentally different from existing index/index pattern/alias/datastream that require explicit permissions. Administration of the setting of resources would require permissions of the 'targeted' indexes.

This resource would need a new way to be granted / reviewed since the existing permissions model only allows for cluster wide, index and tenants permissions.

I suggest this is called a View, as in "Alice could you grant me access to the 'Company wide sales view'".

Additional Context

Related issues

varun-lodaya commented 7 months ago

Are you trying to just abstract all the permissions into a common logical permission and then apply or will this work differently than existing permissions? Also, this sounds like coarse grained access control right? How will you customize and let user configure fine grained controls here?

peternied commented 7 months ago

@varun-lodaya Thanks for taking a look - the utility of fine vs course is dependent on what resource is being permissioned. The problem raised by this issue is that it can be unclear how different levels of fine grain control operate. I don't think there is a requirement for a single permissions model; the existing model creates these problems and the question is if we should address it.

We've had many issues reported to back up this problem statement and I'll collect more and update the description to include them.

Do you agree with the premise of the problem, it sounds like you might have an alternative proposal in mind?

derek-ho commented 7 months ago

[Triage] It sounds like this issue is trying to find a path forward for an issue that comes up often regarding how we handle access for certain data sets.

msfroh commented 6 months ago

I've been trying to think about how to use the "views" concept to support multi-tenant clusters.

I wrote up a straw API call example to create a view, specify who can access it, and provide a series of processors that limit the requests/responses on the views: https://docs.google.com/document/d/1AUEPXw5-P7UxW_UhdLdXw_LZTkCVyQdYEnh676WGlC4/edit?usp=sharing

The analogy I gave on another issue is that your cluster is an amusement park and a view is an entrance. The entrance you use determines which credentials are required and what wrist-band you're issued (which determines what rides/amenities you can access).

@peternied -- I don't know if this is in line with what you had in mind, but it makes sense to me.

peternied commented 6 months ago

@msfroh Thanks for writing up that doc. I like many of the ideas you proposed. As 'views' has an implication in the DB space, I wouldn't mind an alternative, along the lines of data_park_entrance_gate_name 🤣

opensearch-project / security