Open data-man opened 1 year ago
Something like this would be possible, but at the moment, every token can be separated by whitespace. For example, if you had a rule like ident = property(ID_Start) property(ID_Continue)*
, identifiers would include things like abc
but also a b c d
. The best way to make custom identifiers right now is via user-defined tokens, which involves writing a bit of code in a C function and passing it to the generated parser to use during tokenization.
Owl is awesome, thank you!
My proposals:
range(cp1, cp2)
orrange[cp1, cp2]
-cp1
andcp2
are codepoints here (hex or decimal)block(name)
- Unicode's script name (Basic_Latin
,Latin-1_Supplement
, etc.)property(name)
- Unicode's property name (White_Space
,Hyphen
,Ps
,Mn
, etc.)script(name)
- Unicode's script name (Common
,Latin
, etc.)What do you think?