polux / parsers

Parser Combinators for Dart
Other
23 stars 10 forks source link

Unicode parsers #11

Open dikmax opened 10 years ago

dikmax commented 10 years ago

Current version of alphanum, upper, lower parser accepts only latin chars while Haskell's parsers accept all unicode chars. I know that Dart doesn't have methods to test char agains unicode class, but we could implement it on our own but we can use http://www.unicode.org/Public/6.3.0/ucd/UnicodeData.txt to generate predicates which work with all unicode chars.

Does it makes sense? Or this shouldn't be part of parsers library?

polux commented 10 years ago

It definitely makes sense but IIRC the reason why Dart doesn't have methods to test char agains unicode class is because the generated javascript would be huge. The same concern applies to parsers. Maybe it could be in a separate library (in the same package, but in a separate library still) so that if you don't import it it doesn't end up blowing up the generated javascript. Or maybe such a predicate should be in another package even, so that it benefits everyone, and this separate library in the parser package would import the predicate from this other package.

polux commented 10 years ago

Ah, just found http://pub.dartlang.org/packages/unicode_helper. So we could definitely use that.

polux commented 10 years ago

Alternatively I (or you) could create a package parsers_unicode. I'm fine with both solutions, as long as it allows generating reasonably small JS files when one doesn't care about unicode.

dikmax commented 10 years ago

I vote for separate library in same package. I'll try to implement this later.

polux commented 10 years ago

Cool, thanks for you interest and help!