hypothesis / product-backlog

Where new feature ideas and current bugs for the Hypothesis product live
118 stars 7 forks source link

Determine what we're reading from groups and define what we write into our search index #92

Closed ajpeddakotla closed 7 years ago

ajpeddakotla commented 7 years ago

We need to determine what we're reading from groups, and make sure this respects the defined ACL.

Once we've completed this work, we can define what we write into our search index.

This issue tracks this work, and fits into the larger epic here.

@chdorner can you add more detail to this issue?

chdorner commented 7 years ago

tldr: I will summarize what @nickstenning and me were talking about today and create three new (smaller) cards to track each individual task (#95, #96, #97).


Making sure we load the correct set of annotations given a request we have to take the ACL of an annotation and the ACL of the group into account. An annotation can be public or private (with the shared column), and the group can have readable_by set to members or world. These two settings influence which annotations a request is able to see and render.

The access control to an annotation can be seen as the following pseudo-code:

def has_read_access?(request, annotation):
    if annotation.shared is False:
        # The annotation is private, read access is only allowed when the
        # logged-in user is the same as the annotation's user.

        if request.authenticated_userid == annotation.userid:
            return True
        else:
            return False

    # The annotation is shared, read access to the annotation depends on the
    # read access of the group.
    group = fetch_group(annotation.group)
    return request.has_permission('read', context=group)

There are three distinct work packages necessary to make sure that:


Search returns only readable annotations (#95)

Memex currently adds an auth filter on every query, this is implemented by adding a terms filter on the permissions.read field with the effective principals of the current request (and group:__world__). See here for the code.

We could continue doing exactly that and making sure that the read permissions in the index are set correctly. But this means that if the group's readable_by value changes, we would need to reindex all of the annotations. Which is not a feasible option.

As described above, the access control of an annotation is split up into two parts, the access control of the annotation itself with the value of the shared field, and the access control of the group with its readable_by value. These two parts live in different parts of the code, the annotation's access control lives in memex, while the groups access control is desine in h. We can use this way of thinking as the solution to this problem.

If we change the AuthFilter inside memex to only filter out private annotations that are not owned by the current user, we handle the annotation-part of the access control inside memex. This means that we will have to add the shared field to the search index.

And to apply the group access control to search queries we can hook a search filter into memex (as we already do for nipsa), which will add a terms filter on the group field and filter by all groupids the user is allowed to see. This list contains all the groups the user is a member off, as well as every group in the database that has readable_by set to world. A word of caution here is that this will probably work until we have a significant number of publisher groups in our database (possibly a few hundred, or thousands).

With these two changes we are only returning readable annotations from ElasticSearch back to Python and then back to the client. In addition, we can remove the permission field from the search index, because queries only use the (new) shared field, and the existing group field.


Restrict access to annotation when loading by id (#96)

We don't query ElasticSearch when we access a single annotation by its id, which is the case when an annotations gets read, updated, or deleted. We don't need to care about updating or deleting an annotation since access to these operations are purely based on the annotation data itself (allowed when the request's userid is the same as the annotation's userid).

But when reading a single annotation (GET /api/annotations/{id} and GET /api/annotations/{id}.jsonld), we need to make sure to restrict access to the annotation based on its own access control (shared column), and on the access control of the group (readable_by).

Restricting access based on the value of the shared column, is again the easy part of this solution. Since that is all code living in memex. The annotation's __acl__ implementation will change to only give the annotation's userid the read permission, any other principal will not be allowed to read.

We will then implement a custom authorization policy (based on pyramid.authorization.ACLAuthorizationPolicy which handles context objects of type memex.models.annotation.Annotation differently from others. When the context object is an annotation it will check the annotation ACL and the annotation's group ACL and then permit or restrict access by looking at both ACLs.

With both of these changes implemented we can be sure that the two annotation read endpoints will only allow access to the appropriate requests.


API response renders read permissions correctly (#97)

Currently we render only the groupid in the read permissions if the annotation is shared (see here. This will have to change as it depends on the value of the Group.readable_by property.

The memex annotation presenter will have to take the value of the Group.readable_by property into account by using the groupfinder then rendering the appropriate value (group:{pubid}, or group:__world__).

Some changes need to happen so we get access to the request for using the groupfinder, or the result of the groupfinder gets passed into the presenter, either way.

ajpeddakotla commented 7 years ago

Closing this as we've broken it up into separate cards.