Closed tpluscode closed 10 years ago
Okay I'm answering my own question here.
Given that 10000
and EFFFF
are represented as D800 DC00
and DB7F DFFF
respectively I figure it's possible to match the high and low surrogate as separate ranges in sequence.
new CharRangeTerminal('\xD800', '\xDB7F') & new CharRangeTerminal('\xDC00', '\xDFFF')
Any reason why this sould be a bad idea?
Eto.Parse doesn't directly support this, no. Your approach isn't a bad one, though it would be more efficient (and easier to use) to create a new parser class that knows how to read high unicode characters.
I opened a pull request #11. Please follow my progress.
To fully replicate the property path grammar I partially described in issue #9 I need a parser, which matches high codepoint UTF-8 characters. Originally the rule contained a range
Unfortunately .NET doesn't allow char constants over 65535. Does Eto.Parse support matching such characters?