Closed tpluscode closed 10 years ago
Excellent start! Should the GetValue implementation return an int or a string? Or could that be configurable?
I think it should return an int. A pair of chars would be inconvenient (Tuple<,> ?) and most string representations aren't printable anyway I guess.
Hm, I noticed that CharTerminal
has its dedicated RepeatCharTerminal
. I'm obviously closely imitating the CharTerminal classes and thus I'm wondering whether I should also add a specialized repeat parser...
The RepeatCharTerminal is used for more performance vs. using many alternate CharTerminals inside a RepeatParser.. I don't think that it will be necessary for the surrogate pair characters, as it isn't as if an entire document will be consisted of such characters.
Ideally, one would be able to test for surrogate characters using the existing RepeatCharParser instead.. though I'm not sure of the performance impact there..
Using existing char parsers for surrogate pairs may not be a good idea, because they are quite different. Each one is actually two chars, each from a specific range and must come in correct order.
Or maybe did I misunderstand you?
I agree that we shouldn't change the existing char parsers. The RepeatCharTerminal uses RepeatCharItem with a test function and doesn't use CharTerminal at all. I was suggesting that only the RepeatCharTerminal could be enhanced to deal with surrogate pairs. I don't know at this point whether that would be a good idea or not, so if you think it's best to leave it or create a RepeatSurrogateCharTerminal I'm good with either way.
I see. It would be possible. However this probably is esoteric enough that no one will ever notice that the generic repeat parser isn't as performant.
Yeah, I believe you are right. (;
I guess I'm pretty much done. Please review the code
Ah yes, I know. I wanted to overload that GetValue methods and Description property
Wow! looks awesome! I'm still in the process of setting up the CI builds, but it seems that two of the new tests aren't passing? Getting "Unable to cast object of type 'System.Int32' to type 'System.String'." This might be the CI being weird though..
Oops, probably not. I hastily overrided the GetValue method probably broke the casting in tests.
Awesome, thanks for the contribution! Are you okay with contributing under MIT, and assigning copyright?
Sure, no problem there
Hi. I see that this pull reuqest hasn't been released. Any chance for a NuGet update?
This is a work in progress. When done it will implement matching UTF-16 characters, which currently don't have a specific parser.
To-Do: