Closed tfgordon closed 10 months ago
Note that PetitParser never supported any other encoding but the standard UTF-16 code units of a Dart String
. I recommend that you convert your input to Dart before parsing, for example using the built-in Latin1Codec
.
I am not aware of a change in how characters are read in a long time. Could you provide a short reproducible test-case that passes with PetitParser 4.0.2, but fails with a newer version?
I agree that the built-in predicates such as letter()
are simplistic. It would be great to have built-in support for Unicode character properties. Happy to discuss a possible implementaiton.
Thanks for your quick response. I now think the problem is not with PetitParser, but rather was caused by a change in the way I store files, made to be able to deploy the app as a webapp. I am now using the Hive NoSql database. Printing out the output from the database, before I try to parse it with PetitParse, shows that it is corrupting (some?) non-ASCII characters. The characters returned are not ones handled by the grammar so I get a parse error. So I will see if this problem can be fixed and hope that this will solve the parsing problem as well.
I'd like to be able to help you with extending the letter() implementation, but I'm afraid that's over my head.
I found the problem. Hive encodes strings using UTF8. I just needed to convert them into UTF16 and everything works as it should.
I need to parser Latin1 characters which are not ASCII. My parser was working with version 4.0.2, but I need to use the newer version of petitparser now due to dependencies with the pdf Flutter package that I also need.
Here's a simplified code snippet which fails:
// letter() extended with Latin 1 characters for coverage of most Western European languages final Parser extChar = letter() | char('ä');
I've also tried using pattern, like this:
final Parser extChar = letter() | pattern("À-ÿ");
This also fails.
Would it be easiest to extend letter() to cover all Latin 1 alphabetic characters?