sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.27k forks source link

Proposal: remove github-proxy #9663

Closed slimsag closed 1 year ago

slimsag commented 4 years ago

We don't use a similar service for other code hosts, and AFAIK this service came from a time when we had major issues enforcing GitHub rate limiting in our application. Is there a reason we should keep it around in today's world?

@sourcegraph/core-services

unknwon commented 4 years ago

I don't have context on why/how we developed the idea to have github-proxy, but I think we're adding rate limiting (and can be controlled by site admins) for all syncers/processes spawned in repo-updater cc @ryanslade. Not sure if the proxy is still required.

tsenart commented 4 years ago

github-proxy is meant to prevent us getting abuse rate limited which happens when concurrent request from the same token are executed. That’s a different rate limit than the normal one.

tsenart commented 4 years ago

https://developer.github.com/v3/#abuse-rate-limits

slimsag commented 4 years ago

How does github-proxy bypass that? Admins only provide one token, so I don't see how it could currently -- and there is only one replica of it so it can't avoid any IP limiting either.

Regardless, it doesn't seem good that we would be relying on a hack to get around abuse rate limiting?

keegancsmith commented 4 years ago

It has a transport with a mutex wrapped around it ensuring we only have one request at a time. The tricky part is porting that behaviour into repo-updater since the current model is to not have long lived clients. But we can probably use the same approach that @ryanslade is using for a rate limiter registry.

I agree that it would be nice to get rid of github-proxy. Last time I looked at this it wasn't as quick to do as I hoped so I just moved on.

The historical context was we used to make requests directly from sourcegraph-frontend. It does help bypassing IP limiting, since when we ran into those issues we would scale up the number of replicas. Additionally it was a simple place to make the logic more complicated (eg round robin tokens/etc if we needed to). It should be deleted though.

slimsag commented 4 years ago

It does help bypassing IP limiting, since when we ran into those issues we would scale up the number of replicas.

FWIW I am not aware of any customer ever having done this, so it likely only applied to Sourcegraph.com -- and we've been using a single instance there for a while.

If we can do this in 3.16, it could reduce some friction of implementing https://github.com/sourcegraph/sourcegraph/issues/9785 (not required, but would be a good time to do it)

keegancsmith commented 4 years ago

If we can do this in 3.16, it could reduce some friction of implementing #9785 (not required, but would be a good time to do it)

Agreed. Sounds like its worth looking into.

FYI here is the last time you asked about this https://github.com/sourcegraph/sourcegraph/issues/574 :)

keegancsmith commented 4 years ago

So after some more consideration it is worth keeping. The only downside is its an extra service. The reason to keep it is it isn't just repo-updater that communicates with github.com. Campaigns does it from the frontend. We do requests from the frontend for sourcegraph.com. Additionally if we do ever run into rate limit issues on Sourcegraph.com it will give us flexibility to respond to it.

We could consider only keeping it for Sourcegraph.com, but I think the effort of removing it doesn't really justify that.

Next steps: If you agree, should we close this out and document the justification somewhere?

slimsag commented 4 years ago

I am not convinced that is the right call.

We need to simplify distribution of Sourcegraph and reduce the number of services it needs because it is rapidly exploding and becoming much more difficult to monitor/reason about, I am not convinced that if we were considering how to solve these problems today that we would introduce github-proxy and so that is the main reason it is "in my sights" for axing soon.

Additionally if we do ever run into rate limit issues on Sourcegraph.com it will give us flexibility to respond to it.

This hasn't happened in the last year at least, and I believe even the last 2 years. It doesn't seem worth keeping around an additional service (which has substantial overhead) just for added flexibility. It also doesn't scale well to other code hosts: will we eventually need github-proxy, bitbucket-proxy, gitlab-proxy`, and more?

The reason to keep it is it isn't just repo-updater that communicates with github.com. Campaigns does it from the frontend.

Then I see two valid options:

  1. Campaigns should communicate with GitHub.com through codehost-syncer (repo-updater). This is the logical place for this interaction to live in my view.
  2. If not #1 above, this logic could be per-frontend replica.
ryanslade commented 4 years ago

We could potentially use the new rate limiter code to achieve this. Once https://github.com/sourcegraph/sourcegraph/issues/9953 is done we could make it so that the rate limiter for GitHub only allows one concurrent request. This would essentially be point 2 above.

Later, we could instead make frontend requests go via repo-updater in which case all requests would share one global rate limiter that lives in the repo-updater instance.

github-actions[bot] commented 2 years ago

Heads up @jplahn - the "team/repo-management" label was applied to this issue.

ryanslade commented 2 years ago

@indradhanush This might be something to consider when thinking about the new rate limiter design. Could we use it to replace the need for github-proxy?

indradhanush commented 1 year ago

@ryanslade Reading through the comments on this issue and I totally missed your question. I think its a great callout and if I pick up the ball again on the new rate limiter design I will keep this in mind and see if we can phase out GitHub proxy (it should be doable).

eseliger commented 1 year ago

Tracking in https://github.com/sourcegraph/sourcegraph/issues/55290, closing this one.