sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Index GitHub issues, Pull Requests, and Branch metadata #50884

Open johnwesonga opened 1 year ago

johnwesonga commented 1 year ago

https://github.com/sourcegraph/accounts/issues/6716 would like to use Sourcegraph for use cases that involve querying metadata from GitHub including Issues, Pull Requests, and Branches.

Right now their users query GHE directly for this information. However, those queries are a burden on the GHE system (which cannot be scaled out to support more of these). They would like to run the reports every day at least. But, the GHE administrator is reluctant to let them run that frequently due to the extra load it puts on the GitHub environment.

Metadata of interest:

Pull requests: creation, closure, comments, emoji, labels added

Issues: creation, closure, comments, emoji, labels added

Branches: creation, closure

Commits: creation

See: https://sourcegraph.slack.com/archives/C01D50MSA7L/p1677259275183599

sanderginn commented 1 year ago

I'm struggling to think which service of Sourcegraph this should fit in - I guess it would fall under repo management, but it feels to me like there's a decent amount of design involved before this can be implemented. cc @sourcegraph/repo-management for input (sorry, I guess this is an outdated handle?)

mrnugget commented 1 year ago

Yeah, this is a huge thing. This needs product and design input.

Also if the problem is this

They would like to run the reports every day at least. But, the GHE administrator is reluctant to let them run that frequently due to the extra load it puts on the GitHub environment.

i.e. that the reports put too much load on GHE, then I'm not sure us syncing all of that data will help that much. Webhooks help, but last time we did this for batch changes, the data in the webhook payloads wasn't complete, so you always had to do another request.