Provider api access tokens

alexcouper commented 10 years ago

Note: This still needs some thought and discussion.

Problem

A provider needs to be able to make a request for eg. features with a restriction on different users/teams on each request.

Proposal 1

(@txels, please feel free to reword this correctly)

A token is included in the request (url or headers?) to core API which dictates the team to filter by.

txels commented 10 years ago

My thoughts on implementing API "tokens":

The simplest way to think of a token is as an alternative mechanism of authentication, i.e. of properly identifying a user by a means different than username and password. In its simplest form, a token can only be created or deleted. In more complex implementations, it can have associated information of "scope", which can include validity dates and areas of functionality that the token gives access to (which then ties together authentication and permissions).

I would advocate for implementing tokens in their simplest form: a foreign key to user, and a token string. Then we only need to implement a custom authentication backend for DRF (http://www.django-rest-framework.org/api-guide/authentication#custom-authentication) that will fetch user from token. After that, it will be as if the user was authenticated, and we will only allow access to data that belongs to that user (by using the for_user methods).

This deviates a bit from having a team token, but makes things much simpler to implement. And a user that belongs in multiple teams can use a single token instead of requiring one per team. It is also a backend that can become fairly reusable across any projects that include a DRF API.

If you find this too limiting, further extensions/variations of this may be:

Include a "scope" in tokens, (in our case this could be e.g. the provider). That way we limit what one can do with a valid token. This provides more granular security (at some usability cost, as users will need to manage and setup multiple tokens). We can easily implement this by attaching the "scope" to the request in the authentication step and implementing an authorization (aka permissions) backend (http://www.django-rest-framework.org/api-guide/permissions#custom-permissions)
If we still want to go for team tokens, I would implement creating a default "api user" account for each team that gets created (to avoid creating a temporary one per request, although that is also feasible without requiring DB access). This user then gets associated to the request by the authentication backend once the token is successfully matched to the team. In any case, I would then use the same type of model Token but with FK to user instead of team.

alexcouper commented 10 years ago

I agree with implementing the simple case initially - with user-based tokens.

So what that means for now is that the provider will be accessing the api using a user's token. I think realistically we'll need to then go for your second bullet point of creating a default api user for each team. Let's remind ourselves why:

Commit is pushed to repo R
Post receive hook is fired for user U, calling the provider API
Provider calls core faking to be user U, accessing all features related to that user.
Branches that look like they match these features have their info updated etc...

For those last two to steps to make sense, the features retrieved should only be those accessible by a team, rather than by a user of a team - as that user could be a part of many teams.

Agreed?

txels commented 10 years ago

So the question is, when one user sets up a hook URL in github, what should it contain besides the token, to narrow then the data to be fetched/updated to the only relevant subset?

From the sample data in your github hook payload test, I know we have repo information in the payload, meaning at this stage we will have at least user (via token) and repo. So we need to find instances of Repository that match that repo URL (or owner.name/name), and are accessible to the user, e.g. (for simplicity I'm writing code as if provider had access to core DB, all those ORM calls will have to be client methods):

repos = Repository.objects.for_user(user).filter(url=url)
# ^ as HTTP API call, sthg like: /repos?url=url&token=1234 
# fetched repos contain project ID

Multiple instances of repo with that URL will, I imagine, be the same repo as part of different projects (we haven't set up unique constraints but I expect this may be the only case where it may make sense). From each repo instance, we will identify the project it is linked to and retrieve features for that project. We use those features to identify and match branches. Very simplified:

for repo in repos:
    features = Feature.objects.for_user(user).filter(project=repo.project)
    # ^ as HTTP API call: /features?project=123&token=1234
    update_branch_info(repo, features, payload)
    # ^ will likely become multiple calls...

So calls to core are not "all features for a user" but features for a project.

To clarify with a silly example, if you and I are working on pretenders and ployst and both needed to use and extend supermutes, we would have:

Project ployst [id 1] with:
- Repository ployst [id 1]
- Repository supermutes [id 2]
- Features x, y, z...
Project pretenders [id 2] with 2 Repository instances:
- Repository pretenders [id 3]
- Repository supermutes [id 4]
- Features a, b, c...

Both supermutes repos would be the same git repo, but they would have multiple instances of the Repository model. Relevant branch info etc is isolated. It would be up to us how we decide to name branches in the shared repo to identify they are for a feature in ployst or pretenders, if feature identifiers are not enough. The branch naming rules would be independent for both repo instances, as they would ultimately be tied to team-provider settings.

So, to answer my own question, the repo's project provides the context, so repo and user are enough for what we need. We do not need team context.

alexcouper commented 10 years ago

Following our discussion on skype this has been moved into http://txels.tpondemand.com/entity/312

The outline of the conclusion was:

Let's have a single "ployst" api hook added to a repo, which is user/team agnostic. In the long run, the url for the hook will include some salted version of the repo in which it lies - to be compared with the payload upon receiving a POST.

Upon receiving a valid POST, a task will be kicked off which will do essentially the same as shown above, without filtering on User.

We'll need to take care of the different authenticating the two different requests coming in to core. With some originating from unsafe sources and some from trusted. The main thought seems to be to use authentication headers to verify the "trusted" sources. An alternative is to create an additional UI app to which the angular app talks and have all core access only allowed from particular hosts.

txels commented 10 years ago

Done in https://github.com/pretenders/ployst/pull/18

pretenders / ployst