google / zoekt

Fast trigram based code search
1.7k stars 113 forks source link

Inconsistent results for a single query #70

Closed dgryski closed 5 years ago

dgryski commented 5 years ago

Querying my corpus, I get the following:

$ for i in $(seq 10); do zoekt 'TODO\(dgryski\):' |wc -l; done
      42
      40
      35
      24
      16
      32
      46
      23
      34
       8

I would expect that results would be 100% of matching files.

dgryski commented 5 years ago

Querying from the web interface, I get

Found 86 results in 54 files, showing top 50 files 

Using rg as a source of truth,

$ rg 'TODO\(dgryski\):'  |wc -l
      84
dgryski commented 5 years ago

The 0-valued zoekt.SearchOptions passed from the command-line search causes the query to be cancelled as soon as one shard returns results.

We can fix by checking if TotalMaxMatchCount > 0 before using it to cancel the query.

diff --git a/shards/shards.go b/shards/shards.go
index 17652f7..c855087 100644
--- a/shards/shards.go
+++ b/shards/shards.go
@@ -189,7 +189,7 @@ func (ss *shardedSearcher) Search(ctx context.Context, q query.Q, opts *zoekt.Se
                        }
                }

-               if cancel != nil && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
+               if cancel != nil && opts.TotalMaxMatchCount > 0 && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
                        cancel()
                        cancel = nil
                }
hanwen commented 5 years ago

thanks for the analysis. Yes, that looks wrong. I'll bake a proper fix (it needs a test.)

hanwen commented 5 years ago

separately, there is the question what the CLI interface should be. Currently, it will order the results like the Web UI, but maybe it's better if it returns exhaustive results. What do you think?

dgryski commented 5 years ago

Given that the command line has all the other text processing tools to construct pipelines, I think exhaustive results make sense as the default or easily accessible with an option.