sourcegraph / scip-clang

Apache License 2.0
50 stars 7 forks source link

Set up periodic Chromium indexing job #328

Open varungandhi-src opened 1 year ago

varungandhi-src commented 1 year ago

We have a Buildkite runner provisioned which is powerful enough to be able to index Chromium in reasonable time. https://github.com/sourcegraph/infrastructure/pull/4910

The build machine is stateful, which is both good and bad. The good is that we:

The bad is that we may run into problems due to build dependency issues as Chromium's build scripts try to install system dependencies.

The basic workflow will look like this:

If there is a failure at any step, we should send a Slack message to an internal channel with a link to the Buildkite job log.

varungandhi-src commented 1 year ago

Some notes based on my convo with William:

  1. We can break the job into 3 steps.
    1. Starts the GCP instance (runs on stateless agent -- the stateless agent has the GCP CLI pre-installed)
    2. Runs the indexing job (on the stateful/powerful agent). Main caveat here is we need to pass in a secret here which lets us upload the index to Sourcegraph.com, but we can figure out how to resolve that once we get to that stage.
    3. Stops the GCP instance (runs on stateless agent)
  2. There is a way in the Buildkite UI under 'Edit Steps' which lets us modify the main buildkite command, where we can point it to another pipeline file.
image

Example of non-trivial pipeline magickery: https://github.com/sourcegraph/sourcegraph/tree/wb/app/aws-macos

dominiccooney commented 1 year ago

Update depot_tools

gclient does this, and you should run gclient sync to pull updated dependencies anyway. IIRC depot_tools or gclient update—forget exactly which—also fetches some Python environments from something called CIPD. Its infra can be a bit flaky but if you want to work around it, it is a lot of work.

Q: Does gn need to be reinvoked here if any of the build files have changed? Or is re-running ninja sufficient?

In general, gn does not need to be reinvoked. That said, Chromium has a system called landmines for clobbering certain bots. So... YMMV? In case of repeated failures you might like to start from "scratch" (you probably don't need to reclone, but you could blow away your build directory and git reset --hard HEAD --ffxd && gclient sync --force, something like that.)

Do you need to build trunk to have the index in a good state?

Delete useless artifacts ...

What are the useful artifacts? I'm wondering if you can get away with building a lot less.

varungandhi-src commented 1 year ago

Do you need to build trunk to have the index in a good state?

From the indexer's perspective, it doesn't matter which exact commit it is, but we'd like to regularly index newer commits rather than purely regression testing against a pinned commit.

What are the useful artifacts?

Anything that's needed to type-check in-project C++ files. Largely this would be generated headers, but not generated C++ files (or files in other languages).