spring-io / pivotal-cla

Apache License 2.0
10 stars 16 forks source link

Update all pull requests after signing #116

Open mp911de opened 8 years ago

mp911de commented 8 years ago

Currently, only one PR is updated once a user signs the CLA. It's the PR from which the user triggered CLA signing. If a user submits multiple PRs the commit status remains on failure. It's possible to trigger a manual sync but I think we could improve here. Maybe we could check for all open pull requests for a particular user and update all of these?

rwinch commented 8 years ago

@mp911de The problem is that GitHub doesn't provide a good API for finding all Pull Request from a particular user. We would need to:

This seems like a lot of data access and might make us hit the limit. If we can do all of this with the user's Access Token (except for the write to update the status if it needs changed) I think this would be doable, but I don't know if we can do this with the current OAuth Scope assigned to the token (I also don't want to require additional permissions on the access token)

Shredder121 commented 8 years ago

Idea time. A pull request is an issue, so the issues API can be used to find the PRs. https://api.github.com/repos/spring-projects/spring-boot/issues?creator=Shredder121 Defaults to open issues as well, so it's a lot less data, and probably what you want.

Shredder121 commented 8 years ago

The problem is that GitHub doesn't provide a good API for finding all Pull Request from a particular user

This is however still an issue.

mp911de commented 8 years ago

I'm always coming back to the same point: Currently, GitHub is our primary data source for PRs. We determine some state (whether to add comments or to update a particular PR) based on the PR or the details within the PR. This is fine but also comes with some limitations like the "get all open PR's for a particular user".

I think the mentioned issue (multiple open PRs, user signs CLA and only one is updated) is an edge-case. It's still possible to manually perform a sync to make the status green. Maybe it would be still an idea to consider a supporting storage of data so we might optimize things.

Storing more data on our side introduces complexity and the need for maintenance, so I'm torn about the issue and not fully convinced this is a good idea. I mostly wanted to express my thoughts and hear what you guys think about it.

rwinch commented 8 years ago

@mp911de Thanks for following up.

This is certainly an edge case as most of the time a user will likely sign the Pull Request right away. The time this is more likely to happen is if the user created multiple Pull Requests prior to the CLA tooling being setup on the repository.

I agree that we probably don't want to keep track of the pull requests on our side. One major reason for this is that it doesn't solve for the primary use case we need. That is it wouldn't help if the pull request was created prior to the CLA tooling being linked to the repository. At that point we have to scan the repository for the Pull Request anyways.

Shredder121 commented 8 years ago

One major reason for this is that it doesn't solve for the primary use case we need.

Makes sense, and cuts down on unnecessary complexity, although this is also a good point:

Maybe it would be still an idea to consider a supporting storage of data so we might optimize things.

But I am all for leaving it as it is.

Side note, I also have a project where I built in support for coming back later to a PR. I implemented it as paying a few hits upfront, but after that all remaining data is delivered via the webhook system. (yes yes, it's now multiple repositories.

But at startup time, we know about the access tokens right?

rwinch commented 8 years ago

But at startup time, we know about the access tokens right?

We don't have an access token used for updating the status until the repository is linked. This is because we don't want an access token to "rule them all". It also helps prevent hitting the rate limit because we are using different access tokens for different repositories.

If we are to do anything, I think it makes sense to see if we can query using the logged in user's access token to query all the repositories. It might take a little bit of time to process, but we wouldn't cause rate limiting issues. Once we find the PRs that need updated, we would use the access token associated for the repository to update the statuses (if needed).