sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Enable indexed search by default on sourcegraph/server #2176

Closed sqs closed 5 years ago

sqs commented 5 years ago

Indexed search is currently not enabled by default for sourcegraph/server because it has unbounded resource requirements that @keegancsmith (rightly) deemed likely to overwhelm single-node instances. It is enabled by default on local dev and on cluster deployments. See https://sourcegraph.slack.com/archives/C07KZF47K/p1549416905712500?thread_ts=1549416905.712500 for context.

It would be good to enable it by default on sourcegraph/server (and do the requisite work necessary to make that not overwhelm a server).

Related to https://github.com/sourcegraph/sourcegraph/issues/2165 (that is about docs).

cc @slimsag

Customers: https://app.hubspot.com/contacts/2762526/company/887065580

slimsag commented 5 years ago

Note: Enabling indexed search on a sourcegraph/server instance and gradually adding repositories is mostly OK because it then consumes resources gradually. However, if you enable it after adding a lot of repositories to your instance, it can quickly consume all resources and effectively take down the instance or render it unusable.

keegancsmith commented 5 years ago

Why would zoekt-sourcegraph-indexserver have a hard time if indexing is enabled after thousands of repos have been added to an instance?

zoekt-sourcegraph-indexserver is bounded in the amount of resources used. In particular it has a tuneable maximum amount index jobs it runs at one time (related to the number of cores). Each job tries to use as much CPU as possible, but is bounded in the amount of RAM it will use. Essentially it will shard a repo into 100mb chunks, where 100mb is the output. So it should use roughly 100mb * jobs when indexing at capacity. Note: I may be remembering the number incorrectly for maximum shard size.

The webserver is problematic after indexing is complete because it is bounded by the size of the working copy of all cloned repositories. It keeps in memory the index (trigram -> posting lists), and it builds result sets in memory (results sets are bounded though). I can't remember what the usual factor is offhand, but the number in my head is 0.7 the size of the working copy. Note: it excludes a lot of files like binary files / large generated files / etc so the working copy size can actually be quite reasonable compared to what du -h tells you locally.

Now the issue boils down to this: The top of our funnel is people trying out Sourcegraph on there dev laptop in Docker (possibly on a Mac). A laptop getting pummelled by CPU intensive index jobs can leave a bad taste / make a laptop unuseable. Once the indexing is done, the docker container may just continually OOM.

cc @ijt

ijt commented 5 years ago

Now the issue boils down to this: The top of our funnel is people trying out Sourcegraph on there dev laptop in Docker (possibly on a Mac). A laptop getting pummelled by CPU intensive index jobs can leave a bad taste / make a laptop unuseable. Once the indexing is done, the docker container may just continually OOM.

@keegancsmith, would it prevent OOMing if we enabled indexing from the start of the instance?

keegancsmith commented 5 years ago

No. The webserver memory usage has to do with the size of all cloned repos working copy. So the issue is in resource constrained environments this is not acceptable and is the only example in Sourcegraph were we use that much memory without any related actions. I think the solution here is to either change zoekt such that it can rely on FS paging (maybe mmap the indexes?) or somehow make zoekt adapt to the amount of memory used to avoid it using too much. If it needs too much show the admin a warning.

sqs commented 5 years ago

Idea: With https://github.com/sourcegraph/sourcegraph/pull/3374, I think we could just enable indexed search by default on all instances, including single-node sourcegraph/server instances.

If there are > 100 repositories, it will show a perf warning. That perf warning is intended for non-indexed search right now, but it also makes sense for indexed search (because < 100 repositories is probably fine for single-node indexed search, but > 100 is probably getting into OOM danger zone).

The one change we should make is to ALSO show that alert at the top of the page (in a global alert) for all users if > 100 repositories are present (because indexed search can affect the system in the background even if the user isn't on the search page).

The argument for this is basically the same as https://github.com/sourcegraph/sourcegraph/pull/3262#issuecomment-481400215.

@nicksnyder @ijt What do you think?

slimsag commented 5 years ago

That logic makes sense to me.

I will later make a proposal / argument for introducing a third official deployment option between our Server and Kubernetes cluster offering, which is based on docker-compose. It would make these problems moot and address other issues, I haven't had a chance to do this yet though.

ijt commented 5 years ago

I buy the argument and I'm in favor of doing something like this.

At the same time, a risk is that this heuristic won't detect potential issues with a small number of large repositories. The index size grows with the number of locations where trigrams are found, not the number of repos, although those things are correlated.

slimsag commented 5 years ago

@ijt I think the same issue is true of non-indexed search: large repositories (which would have a large index size) would also cause searcher to consume more resources. The scaling is probably pretty similar between the twow

sqs commented 5 years ago

@ijt Now that there is a way forward here in the near term, this is higher priority than other longer-term search work. Would you be comfortable getting https://github.com/sourcegraph/sourcegraph/issues/2176#issuecomment-484762332 into 3.3.1 (ie doing this soon and getting it out in a patch release)? By "comfortable" I specifically want to make sure you feel it would not be a risky/big change to add on a patch release. I was hoping it is just a matter of showing the alert in another place, in https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/web/src/global/GlobalAlerts.tsx#L31:14, and no search backend changes.

Accounting for the upgrade path, here is what I would propose:

(The search.suppressPerformanceWarning setting would be how we let specific customers who we know to have provisioned appropriate hardware for a single node to avoid having their users be confused by the warning after the 3.3.1 upgrade.)

ijt commented 5 years ago

@sqs, sure, I'm fine with that. When would I need to check that in for 3.3.1?

nicksnyder commented 5 years ago

Patch releases are on demand so as soon as your change is merged into master then @slimsag (the release captain) can help you cherry-pick it onto the release branch and tag a release.

slimsag commented 5 years ago

Anybody can cut a patch release, it doesn't involve the release captain, you just need to follow the steps outlined here: https://github.com/sourcegraph/sourcegraph/blob/master/doc/dev/patch_release_issue_template.md

slimsag commented 5 years ago

I am of course happy to help / assist as needed

ijt commented 5 years ago

Sounds good.

On Fri, Apr 19, 2019, 12:43 PM Stephen Gutekanst notifications@github.com wrote:

I am of course happy to help / assist as needed

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/sourcegraph/sourcegraph/issues/2176#issuecomment-484996528, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADZKSQ6S4Z2VR6VPHDOPTPRIOH5ANCNFSM4GUS7S5Q .