ocaml-omake / omake

The new home of OMake - docs, downloads, mailing list etc. see:
http://projects.camlcity.org/projects/omake.html
GNU General Public License v2.0
67 stars 25 forks source link

Regular expression - bracket expression, corner case #114

Open cspiel opened 6 years ago

cspiel commented 6 years ago

When trying to capture a text enclosed in (non-nested) square brackets like, for example, in

mooplot.cc:28:26: warning: pass by value and use std::move [modernize-pass-by-value]

with a longest, leftmost regular-expresison engine, I come up with

warning:[^[]*\[\([^]]*\)\]

relying on Section 9.3.5 #1 of the POSIX Standard, which states

The right-square-bracket (']') shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial circumflex ('^'), if any).

Thus I was surprised that osh(1) barfs:

Malformed regular expression 'warning:[^[]*\[\([^]]*\)\]': Lm_lexer: regex: mismatched parenthesis

Replacing the opening or closing square brackets inside the bracket expressions with their octal or hexadecimal equivalents works ok. This means that just the corner case of literal square brackets is not covered. (I'm aware that Omake/Osh do not claim to be POSIX-compliant.)

Full demo program:

###  match.osh

.LANGUAGE: program

print_newline() =
        print($'''
''')

show_sample_text(a_sample) =
        println($'Sample Text')
        println($'-----------')
        println($"'$(a_sample)'")
        print_newline()

match_string_against_patterns(a_sample, some_patterns) =
        println($'Lex-search')
        println($'----------')
        foreach(p => ..., some_patterns)
            println($"pattern '$p'")
            have_found_patten = false
            channel = open-in-string(a_sample)
            lex-search(channel)
            case p
                println($"matched '$1'")
                have_found_patten = true
                export
            if not(have_found_patten)
                println($'pattern did NOT match!')
            close(channel)
            print_newline()

##  Excerpt from the ascii(7) manual page.
##
##      Oct   Dec   Hex   Char
##      ──────────────────────────
##      133   91    5B    [
##      134   92    5C    \  '\\'
##      135   93    5D    ]
patterns[] =
    $'warning:[^\133]*\[\([^\135]*\)\]'    # octal notation
    $'warning:[^\x5b]*\[\([^\x5d]*\)\]'    # hexadecimal notation (lowercase)
    $'warning:[^\x5B]*\[\([^\x5D]*\)\]'    # hexadecimal notation (uppercase)
    $'warning:[^[]*\[\([^]]*\)\]'          # character notation

sample = $'mooplot.cc:28:26: warning: pass by value and use std::move [modernize-pass-by-value]'

show_sample_text(sample)
match_string_against_patterns(sample, patterns)

Simply say osh match.osh to reproduce the error.

ANogin commented 6 years ago

To be honest, I no longer have any clue how/why we ended up with our own custom regex engine in omake (or more precisely, libmojave). Perhaps there were no good alternatives back in the day. But it seems wrong or at least completely unnecessary today - I would recommend replacing it with a more standard library.