rubensworks / article-privacy-decentralized

📜 Vision/position article 2020
https://rubensworks.github.io/article-privacy-decentralized/
1 stars 0 forks source link

How should access keys relate to access control (policies?) #3

Open simonstey opened 4 years ago

simonstey commented 4 years ago

I'm currently struggling with figuring out how to intertwine (i) access keys as currently described in the paper with (ii) access control policies..

I tried to wrap my head around it by drawing the current framework in the way I understood it here (also posted it on slack) -> https://slack-files.com/TPDH04UJ1-FT43WFZRT-5ad7c0b578 .

image

Here, AC happens at the very end, when the CSQE wants to execute q against a user's data pod. The data pod matches the request for perm(i, acl:Read, q) with i being the WebID of the requester and q a quad pattern query, against its set of applicable ac policies. Whether a policy is applicable, could e.g. be determined by checking if the request (i.e., perm(i, acl:Read, q)) matches the shape the policy was specified against (e.g., "all foaf:members of ex:Company1 which have at least 1 vcard:hasEmail are permitted to acl:Read all quads of F1. Everyone is permitted to read rdf:type quads of F1 and F2") and responding to the CSQE accordingly.

Since we utilize AMF to prevent guessing a pod's content, do we still need all those individual access keys to prevent unauthorized reading? An attacker would only know that a particular source F1might have results anyway.. When he then tries to execute q against F1, he has to go through access control first.

or would a user/app at the very beginning fetch access tokens from all (=?) data pods by authenticating to them? In which case an app wouldn't be able to discover "new" data sources as it would have to fetch access keys from all potential data pods before querying.. right?

simonstey commented 4 years ago

copy paste of @skirrane email:

Here, AC happens at the very end, when the CSQE wants to execute q against a user's data pod. The data pod matches the request for perm(i, acl:Read, q) with i being the WebID of the requester and q a quad pattern query, against its set of applicable ac policies. Whether a policy is applicable, could e.g. be determined by checking if the request (i.e., perm(i, acl:Read, q)) matches the shape the policy was specified against (e.g., "all foaf:members of ex:Company1 which have at least 1 vcard:hasEmail are permitted to acl:Read all quads of F1. Everyone is permitted to read rdf:type quads of F1 and F2") and responding to the CSQE accordingly.

Yip, this is the way it should work

Since we utilize AMF to prevent guessing a pod's content, do we still need all those individual access keys to prevent unauthorized reading? An attacker would only know that a particular source F1might have results anyway.. When he then tries to execute q against F1, he has to go through access control first.

The AMF serves two purpose: (i) to ensure that aggregators and query engines can't see the content; (ii) to allow requesters to know if a particular source can answer their query (i.e. (a) does it have the data and (b) do they have permission to access it.).

For this we need to have a key that denotes access. So essentially (i) this key is for query planning access control and (ii) SHACL is for access control at query time. We could sell it as a two factor access control strategy.

Clearly, there should be a direct correlation between SHACL access control rules and keys (which could simply be access tokens indicating if access is permitted or not).

We should have one access key per quad, as otherwise we would have blowup in the summaries, which would be bad. This means that each requester will need to be given multiple access keys, or a super key that is derived from the individual access keys. Again we have a key management issue ....

or would a user/app at the very beginning fetch access tokens from all (=?) data pods by authenticating to them? In which case an app wouldn't be able to discover "new" data sources as it would have to fetch access keys from all potential data pods before querying.. right?

Yes this is another issue, how do requesters get their hands on the keys in the first place. Here we assume that there is some agreement to collaborate, (e.g. if you want to see events in my google calendar, I would have to share my google calendar with both of you).

The problem in our scenario is that I may need to give you a new key every time I create a new quad.

simonstey commented 4 years ago

(ii) to allow requesters to know if a particular source can answer their query (i.e. (a) does it have the data and (b) do they have permission to access it.).

For this we need to have a key that denotes access.

why though? regardless of whether quads were hashed with some key a requester will only ever know that a source might be able to answer their query. The number of TN would rise though, so a CSQE would have to query less sources (the ones to which it doesn't have access to are now TN).

So essentially (i) this key is for query planning access control and (ii) SHACL is for access control at query time. We could sell it as a two factor access control strategy.

why would we need another round of AC at query time though? Unless we allow query-time ac to potentially overrule access tokens and/or let them operate on different levels of granularity, e.g., token-based ac for pod root access, and shape-based ac for files/etc. as discussed in:

https://github.com/solid/specification/issues/69#issuecomment-578447076: The ACLs will be applied as follows: First, the container's ACL will be applied. If the client has read access to the container, an internal redirect is made. If the index.html has its own ACL, then that too will need to indicate that read is authorized for the content to be returned.

they will verify the very same thing and more importantly, would have to be kept in sync..

We should have one access key per quad, as otherwise we would have blowup in the summaries, which would be bad. This means that each requester will need to be given multiple access keys, or a super key that is derived from the individual access keys. Again we have a key management issue ....

which is different from how we currently envision it -> https://github.com/rubensworks/article-adecentweb2020-privacy-decentralized/blob/2c6595e6e75dcc5b02daf7ff7a61e000b3077019/content/code/summarization-algorithm.txt#L10-L13 and image

The problem in our scenario is that I may need to give you a new key every time I create a new quad.

not only that, but also if access should be revoked (or quads are deleted?) in which case all aggregators that include the updated source have to redo their aggregates too, right?


Proposal: Do token-based ac for pod root access, and shape-based ac for files/etc

This would allow for both (i) fine-grained ac using shapes at query time, and (ii) discoverability in case the data owner wants to be discovered (by simply omitting the root level token policy)

skirrane commented 4 years ago

(ii) to allow requesters to know if a particular source can answer their query (i.e. (a) does it have the data and (b) do they have permission to access it.).

For this we need to have a key that denotes access.

why though? regardless of whether quads were hashed with some key a requester will only ever know that a source might be able to answer their query. The number of TN would rise though, so a CSQE would have to query less sources (the ones to which it doesn't have access to are now TN).

The whole idea was to have policy aware query planning.....

Assuming lots of sources.....we don't want to query sources that we don't have access to....

skirrane commented 4 years ago

why would we need another round of AC at query time though? Unless we allow query-time ac to potentially overrule access tokens and/or let them operate on different levels of granularity, e.g., token-based ac for pod root access, and shape-based ac for files/etc. as discussed in:

Yes different levels of granularity could come into play.... plus summaries may get stale and the data provider might have revoked their access

skirrane commented 4 years ago

Based on your comments I get the feeling you would like to remove the key from the summaries and let SHACL take case of the A/C.

This is fine if you want to have more fine grained access control for SOLID.

However, it doesn't support "policy aware federated query processing", where summaries are used to tell me if the source can potentially answer my query, while at the same time respecting the access control rules that the pod owner has placed on their data.

simonstey commented 4 years ago

Based on your comments I get the feeling you would like to remove the key from the summaries and let SHACL take case of the A/C.

no, I want to avoid redundancies 😛

what about having them layered like so: 20200126_115843 (1)

where key-based AC is used on pod or file level (just realised it should be one summary / file and not one per pod) to allow for:

"policy aware federated query processing", where summaries are used to tell me if the source can potentially answer my query

and shape-based AC on quad level to cater for

while at the same time respecting the access control rules that the pod owner has placed on their data.

edit: having them layered like this would also mitigate inconsistencies mentioned before, since a token/key only tells you if you have access to at least some parts of the source which is true as long as the underlying policy/shape doesn't prohibit you from accessing any data at all.. (which is in contrast to quad-level keys, where it could happen that you hold a key for a quad q but are prohibited by one of the pods policies to access q leading to a conflict)

skirrane commented 4 years ago

This is one approach. The benefit is that key management is simplified and it goes towards "policy aware federated query processing".

Personally I'm interested in exploring if we can further optimise the "policy aware federated query processing", by having more granular keys, and what are the tradeoffs that come with such optimisations.

However, for this paper, it is probably enough to set the scene and state that investigations into such optimisations are part of future work.