sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.29k forks source link

Investigate richer output from auto-indexing jobs #60920

Open keynmol opened 8 months ago

keynmol commented 8 months ago

This is a placeholder issue, the idea is not fully developed as we are not prioritising it

Currently the auto-indexing backend machinery and the SCIP indexers communicate via a very limited channel - error code from indexer command invocation. That error code is used for marking the job as failed, and then backend machinery uses that limited information to mark the job as successful/failed.

But the SCIP indexer is capable of much more, as it introspects the build of the project being indexed - and can make early decisions as to the indexing status. This can help reduce failure rates, by marking non-indexable projects immediately.

If the indexer supports partial indexing (like the Typescript one, which is resilient to missing dependencies), this should also be indicated clearly in the state of the job - currently it will just appear successful, even if dependencies could not be resolved.

We should investigate the following:

  1. What sort of information can the indexer return to be helpful for both backend scheduler and the users
  2. How that information can be communicated between backend and the executors
  3. What data model we will require to represent the necessary information in the database and the UI
keynmol commented 8 months ago

Having richer output can also aid the process of automatically blocking repositories: https://github.com/sourcegraph/sourcegraph/issues/60916 using information from the build.

For example, in Gradle builds, if we detect the presence of Android plugin, we can ignore the project entirely - because Gradle Android plugin doesn't bootstrap the SDK, and it has to be placed in a particular location for compilation to work. This needs manual intervention, and we can block the repository until then.