timtadh / lexmachine

Lex machinary for go.
Other
405 stars 28 forks source link

Add support for "^" and "$" on parser #37

Closed rolyagca closed 4 years ago

rolyagca commented 4 years ago

I'm trying to use a regex expression like this "^-?[0-9]+$" but I think "$" is not supported lexer.Add([]byte(^-?[0-9]+$), token("NUM")) lexer.Add([]byte(^-?[0-9]+.[0-9]+$), token("DECIMAL"))

results: (debug=true) ... panic: Regex parse error in production 'group' : at index 0 line 0 column 1 '^-?[0-9]+$' : expected '(' at 0 got '^' of '^-?[0-9]+$' Regex parse error in production 'charClass' : at index 0 line 0 column 1 '^-?[0-9]+$' : expected '[' at 0 got '^' of '^-?[0-9]+$' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '^-?[0-9]+$' : unexpected operator, ^ Regex parse error in production 'char' : at index 0 line 0 column 1 '^-?[0-9]+$' : Expected a CHAR or charRange at 0, ^-?[0-9]+$ Regex parse error in production 'atomic' : at index 0 line 0 column 1 '^-?[0-9]+$' : Expected group or char Regex parse error in production 'Parse' : at index 0 line 0 column 1 '^-?[0-9]+$' : unconsumed input

goroutine 1 [running]: exit status 2

timtadh commented 4 years ago

Neither ^ nor $ are supported. Adding support is not entirely trivial and its semantics may be surprising as this is not a normal regex engine and does not work like grep (eg. it is not line oriented). You can match the end of line by matching the end of line character \n or \n\r or \r. Start of line is a little trickier.

In general, you do not need match these characters for most formal languages for parsing. Even for languages where you need to know where the start of a line is usually just matching one of the end of line characters and emitting a NEWLINE token is enough.

You likely do not actually want to use the tokens you have written but would rather use

lexer.Add([]byte(`-?[0-9]+`), token("NUM"))
lexer.Add([]byte(`-?[0-9]+.[0-9]+`), token("DECIMAL"))