google / zoekt

Fast trigram based code search
1.69k stars 113 forks source link

multi-branch indexing is broken #55

Closed hanwen closed 6 years ago

hanwen commented 6 years ago

for the gerrit repo:

git diff --name-only origin/stable-2.14 origin/stable-2.15|wc 2311 2311 178806 hanwen@hanwen:~/vc/gerrit$ git ls-tree -r origin/stable-2.14|wc 3995 15980 510099 hanwen@hanwen:~/vc/gerrit$ git ls-tree -r origin/stable-2.15|wc 4401 17604 563086

$ zoekt-git-index -branches stable-2.14,stable-2.15 -prefix=refs/remotes/origin .

this should generate a shard of ~6300 files.

what we get is a shard with 9603 files, which is a number that seems all wrong.

hanwen commented 6 years ago

query "CheckAccess"

gerrit.googlesource.com/gerrit:gerrit-server/src/main/java/com/google/gerrit/server/project/Module.java: [ stable-2.15, ]

52: post(PROJECT_KIND, "check.access").to(CheckAccess.class);

...

gerrit.googlesource.com/gerrit:gerrit-server/src/main/java/com/google/gerrit/server/project/Module.java: [ stable-2.15, ] Duplicate result

somehow, we get the same (file, branch) combo twice.

hanwen commented 6 years ago

https://gerrit-review.googlesource.com/c/zoekt/+/168891