sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

incremental-indexing: fetch git diff's directly, instead of fetching all of the commit objects #37063

Open ggilmore opened 2 years ago

ggilmore commented 2 years ago

Zoekt's incremental-indexing implementation works by only indexing the files that have changed since the most recently indexed commit.

It currently does this by (pseudocode):

# copy all of the commit objects from gitserver for both commits , and store them on the local zoekt instance
git fetch $OLD_COMMIT $NEW_COMMIT

# analyze the diff output locally to determine what files have changed 
git diff $OLD_COMMIT $NEW_COMMIT

There is a big opportunity to save time in this process by eliminating the git fetch step. Since gitserver already stores all of the necessary commit information, it seems duplicative to have to copy all of the commits over the network in order to perform a local analysis on the Zoekt instance.

If gitserver was capable of directly providing git diff output via an API call, Zoekt could use that directly to reconstruct the changed files. Since the git diff output is a (much smaller) subset of all the commit information necessary to construct it, transmitting that directly can lead to huge time savings.

sourcegraph-bot-2 commented 2 years ago

Heads up @sourcegraph/search-core - the "team/search-core" label was applied to this issue.

keegancsmith commented 2 years ago

Whats the status here?

ggilmore commented 2 years ago

The current status is that this hasn't been implemented yet - incremental builds still fetch all of the build objects.

James did some good work on this in https://github.com/sourcegraph/zoekt/pull/403, but we haven't revisited this since the red-accounts work. This work hasn't been prioritized at the moment.