pml-lang / pml-companion

Java source code of the 'PML Companion (PMLC)'
https://www.pml-lang.dev
GNU General Public License v2.0
22 stars 1 forks source link

New Nodes to Handle Edge Cases: word-joiner and blank/empty #13

Open tajmone opened 3 years ago

tajmone commented 3 years ago

I've noticed that in the PML User Manual, section Anatomy of a PML Document » Attributes, the line code for the escape character is forced to contain a trailing space (\) — in the source file 05_anatomy.pml:

must be terminated by a backslash ([c \\ ]),

The problem here is that using [c \\] instead of [c \\ ] won't work because it would be parsed as [c+\+\], i.e. the second slash is being interpreted as escaping the closing bracket.

To avoid similar problems (which are typical edge cases found on all lightweight syntaxes) I suggest adding some extra special characters:

(obviously, no closing bracket required for either)

The above example from the PML User Manual could then be fixed via:

must be terminated by a backslash ([c \\[empty]),

Both of these are useful hacks to handle edge-cases where the PML parser could be faced with ambiguities like the above example, and they would be the equivalents of Asciidoctor's predefined characters-substitutions attributes {empty}/{blank} and {wj}, which are extremely useful to handle all sort of edge-cases in AsciiDoc sources.

In Asciidoctor, {empty} and {blank} are identical, one is just an alias of the other; I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

The [wj is also very useful in situations where you need to prevent the browser from wrapping a table column during auto-adjustment (e.g. because one column contains words separated by boundaries like spaces, hyphens, brackets, etc.). Or to prevent wrapping a line between a word and its footnote marker, e.g. someword[1]someword+\n+[1], whereas someword[wj[1]

and sometimes they can just improve source readability

These would be consistent with the current [nl and [sp substitutions available in PML.

References

pml-lang commented 3 years ago

the line code for the escape character is forced to contain a trailing space

Well spotted!

The reason is that the current parser uses a regex that does not consider this edge-case. The new pXML parser (which only reads a sequence of characters (no regexes)) will parse [c \\] correctly as a node c with content \.

However, it's a very good idea to add 'word_joiner' and 'empty' nodes. They can help to explicitly eliminate ambiguities like this, and they are useful in other cases as well, as you mentioned. Will be done. Easy to implement.

I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

I agree.

pml-lang commented 3 years ago

the line code for the escape character is forced to contain a trailing space (\ ) — in the source file

This bug has been fixed in version 2.0.0