yandex / pire

Perl Incompatible Regular Expressions library
http://github.com/dprokoptsev/pire/wiki
Other
329 stars 30 forks source link

pattern without ^ and $ matches nothing unless Surrounded #20

Open starius opened 9 years ago

starius commented 9 years ago
  1. There are two functions called Matches.
    • from pire/run.h:
bool Matches(const Scanner& scanner, const char* begin, const char* end)
{
        return Runner(scanner).Run(begin, end);
}
bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len)
{
        return Pire::Runner(scanner)
                .Begin()        // '^'
                .Run(ptr, len)  // the text 
                .End();         // '$'
                // implicitly cast to bool
}

Which one is correct?

If Begin() and End() are to be called, then patterns without ^ and $ match nothing:

Graph for pattern 'abc'

When Begin() is called, it feeds scanner with special begin char, moving it to dead state 1.

Compare this graph with the graph produced for same pattern surrounded and optimised:

Graph for pattern 'abc' surrounded and optimised

Does this mean that all patterns must begin with ^ and end with $? Are Begin() and End() calls required? It should be clarified and documented.

2 . pigrep

Program pigrep behaves as latter Matches, calling Begin() and End(). It also surrounds its patterns. I have removed surrounding (btw it would be useful option, grep has it as -x, --line-regexp) and get the following results:

$ echo -n 'abc' | pigrep 'abc'
$ echo -n 'abc' | pigrep '^abc'
$ echo -n 'abc' | pigrep 'abc$'
$ echo -n 'abc' | pigrep '^abc$'
abc

Summary of problems here: