Alex/github cloner - Githubissues

alexcouper commented 10 years ago

@txels:

This is the beginning of handling cloning behaviour.

There are some TODOs that are related in ensure_clones_for_project.

I believe that we are missing a model in our architecture.

ProjectRepo: A many to many table between Project and Repository. This would allow easier looking up of "have we already cloned this repo".

Additionally, I'm unsure how we want to clone the repos in terms of deploy keys that we set up on the repositories.

Do we have a single pub+private key that we use for all repos? (seems bad to me)
Do we have one per repo?
Where should these be stored? In the DB - and we write the private key out to a temp file when we need it? or on disk?

txels commented 10 years ago

Will try to answer questions:

ProjectRepo: not sure we need that - I seem to recall having had a similar discussion before. A repo exists now inside a project, so the Repository model already represents a many to many between a github repository and a project if you will. Do we want to share repo (model) instances between projects? Are we going to provide a UI for that? If the purpose is to reuse clones, we can reuse clones by ensuring every github repo has a single clone in the system. Is the purpose is to share branch/build info etc?
Pub/Private key I had the idea that we create one per user (or maybe team, as repos are ultimately tied to teams) when they are created, and give those access to the repo. It's easy to create these on signup / team creation.
Where to store: we can use file storage, django has nice pluggable backends for these and we can later switch to something else in production in a load-balanced environment (S3 or equivalent cloud storage).

alexcouper commented 10 years ago

ProjectRepo: not sure we need that - I seem to recall having had a similar discussion before. A repo exists now inside a project, so the Repository model already represents a many to many between a github repository and a project if you will. Do we want to share repo (model) instances between projects? Are we going to provide a UI for that? If the purpose is to reuse clones, we can reuse clones by ensuring every github repo has a single clone in the system. Is the purpose is to share branch/build info etc?

We only need one clone of any repo to be able to do things with it. If 10 people add django to their project, we should have a single clone that we give their regexes to. We can achieve this with the current model layout, so I'll carry on as is. My concern was that there was some degree of duplication in the database - eg 10 rows all stating the url and name of the repo.

Pub/Private key I had the idea that we create one per user (or maybe team, as repos are ultimately tied to teams) when they are created, and give those access to the repo. It's easy to create these on signup / team creation.

The deploy keys are applied per repo from a github point of view. If we were to apply per team or user we could have many keys against the same repository. I'm not sure this is the behaviour we want.

txels commented 10 years ago

The deploy keys are applied per repo from a github point of view. If we were to apply per team or user we could have many keys against the same repository. I'm not sure this is the behaviour we want.

Normally in the usage model we follow users only will use ployst for repos they (or their organisations) own. So if team === organisation then a team key looks OK to me, there should be no duplication.

In the case of public repos, we do not need a repo key to clone, only a valid github key, so a user's key should work.

I am a bit unclear now though.

I was thinking initially (and naively) that the approach to cloning could be: each ployst user gets an SSH key on signup, that is then added to their personal github account. Then upon cloning we could use that key, as they will have permissions to clone the repo. But of course that doesn't solve the multiple users with access to the same repo very cleanly, as we would have to figure out which user key to use given a repo (we can figure out the first matching user quite easily, but doesn't feel very clean). If we go for team (team may === github organisation in general) it feels a bit cleaner but still we can have more than one team for a given github repo, so we may have various choices of which key to use. So we can go up one more level and decide to use a global "ployst" key, with ployst an app with access to your repos. You had objections against that. Can you make them a bit more explicit?

You suggest instead of going up, going down and actually use a different SSH key per repo URL. That seems to me like a lot of keys to manage eventually. Can we have a chat about this later?

alexcouper commented 10 years ago

A chat sounds good. Will you be online this evening?

txels commented 10 years ago

I am online. Checking flights as well.

alexcouper commented 10 years ago

If we go for team (team may === github organisation in general) it feels a bit cleaner but still we can have more than one team for a given github repo, so we may have various choices of which key to use. So we can go up one more level and decide to use a global "ployst" key, with ployst an app with access to your repos. You had objections against that. Can you make them a bit more explicit?

My objection is simply that if we have x repos active across y projects, it would seem a little insecure to only offer a single key pair to access all x repos. A leak of one private key would provide access to (hopefully) hundreds of repositories.

So what I was attempting to get at was an idea of partitioning the use of these keys. We could do this with one key per repo, or we could have n keys for m repos - say a new key every 10 or something. Either way, we'll need to track which key is used against which repo.

As an aside, this is where my thought of needing a new model of ProjectRepo came from. Because a repo is accessed with a particular key regardless of the project that it happens to be in. In the current schema we would add a column to the Repository table for "key" or something, and that would be duplicated across all matching repos.

alexcouper commented 10 years ago

I'm unsure what the outcome of our discussions were regarding this.

txels commented 10 years ago

My latest thought was leaning towards this https://github.com/pretenders/ployst/pull/54#discussion_r11751265 - one key per repo, AFAIR you were pushing for that and I was convinced.

pretenders / ployst

Alex/github cloner #54