Open slanden opened 1 year ago
The following statement in the PML user manual is not correct:
"Attribute assignments are separated by a space."
Instead, it should state:
"Attribute assignments are separated by whitespace (a sequence of one or more spaces, tabs, and new lines)."
Thanks for reporting this bug in the docs.
I've fixed this in my local branch, and the fix will be included in the next PML version 4.0.0 (planned to be published this or next month).
Hence, the following code is ok:
[tag (
attribute1 = something
attribute2 = 400
)]
That's the same as: [tag (\nattribute1 = something\nattribute2 = 400\n)]
It's not the same. \nattribute1
would be an invalid attribute name, because \
is not allowed in attribute names (and escaping is not supported in names).
The parser generates the following error:
Expecting a valid name. A name cannot start with '\'.
Moreover something\nattribute2
would be parsed as a single attribute value (containing a back-slash, followed by the letter n), because (1) the value is unquoted, and (2) escape sequences are not supported in unquoted values.
And because of (2), the following assignment:
path = C:\temp\test.txt
... is equivalent to:
path = "C:\\temp\\test.txt"
... which means that the value C:\temp\test.txt
is assigned to attribute path
.
The above example is also shown in the PML User Manual, at the end of chapter Text Processing / Escape Characters / Attributes.
I hope this clears it up.
If it's not the same, how are you differentiating between them in your parser?
If I pass a chunk of PDML source as raw bytes and come across a new line, whether it was in the source implicitly by pressing "enter" or explicitly by typing '\n', in both cases the byte value is the same. You're obviously differentiating somehow if you have the error handling in place for it, I just don't see how..
For example, this byte string
b"
\n
"
gives the following bytes: [10, 10, 10, 10]
The bytes for pressing <Enter> or typing "\n" are different.
When you press <Enter> then:
On the other hand, when you type "\n", then you create the following two Unicode characters:
The character escape mechanism of the parser converts these two characters into a single "New Line" character (but only if escaping is supported in the given context).
The actual bytes stored in the file depend on the encoding used. PDML and PML both use UTF-8 encoding.
Well, you taught me something new. I'm writing a parser and in all my tests I was simulating input strings as raw byte strings, so when I type '\n' it was automatically becoming a real newline. But, If I were to read in the text from an IO stream, a '\' is automatically placed before any literally typed '\'+'n', to become as you said, '\' + '\' + 'n'.
It's all cleared up now, thanks!
I think there's another error in PDML Extensions User Manual > Syntax Extensions > Attributes > Lenient Parsing
Under the "Warning", I think the '#' should be '@':
[foo [# a1 = "v1"]]
the '#' should be '@'
Good catch. It's now fixed. Thanks for reporting this.
In https://pml-lang.dev/docs/user_manual, the Anatomy > Attributes section says
The Text Processing > Lenient Parsing section says
So far, so good.
However, the Text Processing > Whitespace Handling > Attributes section, which is not a part of the Lenient Parsing section, shows an example of how you can separate attributes with new lines.
That's the same as:
That looks like the mentioned Lenient Parsing, but I'm not 100% certain. More importantly though, the Text Processing > Escape Characters > Attributes section then says
And the example:
So, given the whitespace rules explained in earlier sections (or maybe just the Lenient Parsing section), this would be interpreted as
producing three attributes;
path
,emp
, andest.txt
because\t
is whitespace.Is that example only valid in non-lenient parsing, and would need to be quoted in lenient parsing?