Closed iain-waugh closed 1 year ago
@iain-waugh, thanks for the report, but I don't fully understand this situation. Could you make the smallest possible PEG grammar and input text, so that I can easily reproduce it? Thanks!
Using this grammar:
# PEG for any thing like: 'x'
sample_peg <- Spacing? character_literal+ EndOfFile
Spacing <- Space*
Space <- ' ' / '\t' / EndOfLine
EndOfLine <- '\r\n' / '\n' / '\r'
EndOfFile <- !.
~_ <- Spacing
%whitespace <- _
character_literal <- < "'" . "'" >
If you try to parse the attached file, you get this: literals.txt:1:10: syntax error, expecting <character_literal>.
literals.txt
The literals.txt file is just: 'A' 'b' '�' 'D'
It can be tricky to create this file with some text editors; a hex dump of it shows character 0xA0 for the non-breaking space.
27 41 27 20 27 62 27 20 27 A0 27 20 27 44 27 0D
It looks like it's a problem because it's saved as an ANSI file. When I re-create it as UTF-8, it works. literals2.txt
27 41 27 20 27 62 27 20 27 C2 A0 27 20 27 44 27
VHDL standard libraries are in ANSI format and they make use of these characters with ASCII codes at 160 and above. Is this something you can fix in cpp-peglib
? The workaround is to require files with these higher ASCII characters to be presented in UTF-8 files.
@iain-waugh I now understand what you mean. Unfortunately, cpp-peglib accepts only UTF-8 text. Since I only need to car for UTF-8 text, I am not planning to support other character encodings like ANSI, SHIFT-JIS or so on. https://github.com/yhirose/cpp-peglib#unicode-support
Encoding related functions are below. In order to accept ANSI text, those should be modified. https://github.com/yhirose/cpp-peglib/blob/master/peglib.h#L78-L207
Sorry that I can't give you much help...
I have found that cpp-peglib does not seem to parse characters higher up in the ASCII character set (character 160 and above), such as those specified in the VHDL standard library which can be found here.
This happens when I specify the characters manually (as per the standard):
other_special_character <- backslash backslash / [!$%^{}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿×÷]
or if I have defined
character_literal
as:character_literal <- < "'" < . > "'" >
With the last example,
peglint
gives me this error:test_16.3_package_standard.vhd:69:6: syntax error, expecting <character_literal>.
Is this something that is supported?
My PEG is here.