google / zoekt

Fast trigram based code search
1.69k stars 113 forks source link

LineMatches sometimes contain newlines #88

Closed ijt closed 5 years ago

ijt commented 5 years ago

The doc comment for the LineMatch type says

// LineMatch holds the matches within a single line in a file.

but it currently does not act as advertised.

Here is a test that runs a query containing a newline. It expects the resulting FileMatch structure to contain two LineMatches, one per line, but instead it gets back a single LineMatch containing two lines.

func TestQueryNewlines(t *testing.T) {
    b := testIndexBuilder(t, nil,
        Document{Name: "filename", Content: []byte("line1\nline2\nbla")})

    sres := searchForTest(t, b, &query.Substring{Pattern: "ine2\nbla"})

    matches := sres.Files
    want := []FileMatch{{
        FileName: "filename",
        LineMatches: []LineMatch{
            {
                LineFragments: []LineFragmentMatch{{
                    Offset:      7,
                    LineOffset:  1,
                    MatchLength: 4,
                }},
                Line:       []byte("line2"),
                LineStart:  6,
                LineEnd:    11,
                LineNumber: 2,
            },
            {
                LineFragments: []LineFragmentMatch{{
                    Offset:      13,
                    LineOffset:  0,
                    MatchLength: 3,
                }},
                Line:       []byte("bla"),
                LineStart:  13,
                LineEnd:    16,
                LineNumber: 3,
            },
        }}}

    if !reflect.DeepEqual(matches, want) {
        t.Errorf("got %v, want %v", matches, want)
    }
}

Here is the output of the test:

[ ~/src/github.com/ijt/zoekt ] go test ./...
--- FAIL: TestQueryNewlines (0.00s)
    index_test.go:214: got [{0  filename  [] [{[108 105 110 101 50 10 98 108 97] 6 15 2 false 0 [{1 7 8}]}] [] []    }], want [{0  filename  [] [{[108 105 110 101 50] 6 11 2 false 0 [{1 7 4}]} {[98 108 97] 13 16 3 false 0 [{0 13 3}]}] [] []    }]
FAIL
FAIL    github.com/google/zoekt 0.067s

Fixing this would make it much easier for Sourcegraph to support multiline searches.

I'm happy to contribute a fix.

ijt commented 5 years ago

I have a fix for this: https://github.com/google/zoekt/compare/master...ijt:newlines-one?expand=1. I know it's meant to be done through Gerrit. I'll do that next.