sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.29k forks source link

GraphQL API: Repository-level actions result in many thousands of repetitive git operations #5715

Open slimsag opened 5 years ago

slimsag commented 5 years ago

The following query takes several minutes to complete because I ask for the list of contributors for each search result. This translates into us running git shortlog -sne once for every single result, despite the fact that they all came from the same repository:

bad query

This is one example of how making GraphQL APIs performant is quite difficult in my view. What is the way to fix this? I see a few options:

  1. Aggressively cache git operations (this would be a good idea regardless of this issue, but doesn't address the main problem here such as if this query didn't involve git operations).
  2. Remove GraphQL fields that allow for creating poor queries -- i.e. why should I be able to query the contributors for a search result's repository, instead of being forced by the API to make a separate query for that information which is more correct?
keegancsmith commented 5 years ago

The aggressive caching is something I have been wanting to do for a while. I have some ideas around designs for this so the cache is per request which should solve most of our issues without needing to worry about ACLs/invalidation/etc.

Remove GraphQL fields that allow for creating poor queries

I think this is the wrong direction, that is the nice thing about graphql. We should instead improve our graphql backend layer.

slimsag commented 5 years ago

Fix bad uses as they come up?

The problem with this is just how easy it is to do today. I intend to document some of these, but it's incredibly easy to construct such queries today that have detrimental side effects. I anticipate this happening frequently.

Rate limits? Maybe ones you can opt out of, for the legitimate use case.

This SGTM.

I think this is the wrong direction, that is the nice thing about graphql. We should instead improve our graphql backend layer.

Makes sense, this is helpful I appreciate you saying this.

slimsag commented 5 years ago

See also https://github.com/sourcegraph/sourcegraph/issues/5705 which is the catch-all issue for this problem with our GraphQL API. This issue is specifically about the bad query linked in the description.