zeux / qgrep

Fast regular expression grep for source code with incremental index updates
MIT License
332 stars 43 forks source link

multiline search #16

Open stelf opened 3 years ago

stelf commented 3 years ago

perhaps something obvious for others..., but really would be nice to have an example of multi-line search. not sure whether is feature request or docs improvement request.

otherwise all works great on Win10 20H2 with Windows Terminal and PS 7.2

update

what i understand from https://zeux.io/2019/04/20/qgrep-internals/ is that qgrep works on a line-by-line basis, but then the article states that ...

The search is done on a line by line basis, however instead of feeding each line to the regular expression engine at a time, the regular expression is ran on the entire file at a time

which means that there should be an option to apply an s or sm modifier to the regexp (re2 supports these, although I tried to feed (?sm) to qgrep and error is produced).

... so the issue is rather a feature request.

zeux commented 3 years ago

The core problem with multiline search is the fact that qgrep splits (long) files between different chunks. This is fairly critical for being able to maintain good search performance on large files - without this, chunks would be very different in size due to occasional large files which would significantly decrease efficiency of parallel search.

It's easy of course to ask re2 to do multiline search, but this will occasionally miss matches that cross the chunk boundary.

stelf commented 3 years ago

this makes sense, indeed.

perhaps then an option to enable chunks to be on a line boundary and while it can miss some matches it will at least enable finding reasonable results that can afterwards be double checked with ripgrep or ag. there are many times when devs. would split lines to keep the 80 chars width or just make sure it is readable - with long function calls, SQL concatenations, etc. too many examples really

point being that qgrep does incredible job, my tool of choice, but anyway has to be double checked now and then when the result is expected to spill over the line boundary.

for the record : presently using it against 90k source files of .. various origin and languages, but still having to double check certain results with rg.