Whitespace not collapsed when line break precedes node end

coemgenuslcp commented 3 years ago

In version 1.5.0 some inline nodes don't collapse the end whitespace when written like this:

Here is a sentence (and "here is a quote"[sup
    [xref node_id="my_footnote_28" text="28"]
]), and here is the rest of the sentence.

resulting in an extra space before the end of the node:

... is a quote&quot;<sup class="pml-superscript"><a href="#my_footnote_28" class="pml-xref">28</a> </sup>), and here ...

pml-lang commented 3 years ago

The following rule is applied in PML:

A text segment composed of only whitespace (i.e. spaces, tabs, and new lines) is replaced by a single space character.

Consider this code:

foo1 [i foo2]
  [i foo3]

The newline + 2 spaces after [i foo2] is replaced with one space character. HTML code generated (classes omitted for better readabilty:

<p>foo1 <i>foo2</i> <i>foo3</i></p>
                   ^

If we don't want a space we need to write:

foo1 [i foo2][i foo3]

HTML code generated:

<p>foo1 <i>foo2</i><i>foo3</i></p>
                   ^

I hope this explains the "extra space before the end of the node" in your example.

Note 1: The rule is currently different (but shouldn't be different) for whitespace appearing right after a node name. In that case the whitespace is entirely removed. For example, this code (with 4 spaces after i):

[i    foo]

... generates this HTML:

<p><i>foo</i></p>

... but should generate the following code for consistency:

<p><i> foo</i></p>

Note 2: The [sp] node can be used to explicitly insert several spaces.

coemgenuslcp commented 3 years ago

Ah, but you are speaking of a space between two inner elements inside a node, whereas my example is of a space between the last inner element and the end of a node. Suppose you wrote the node with the explicit [p] tag:

[p
  foo1 [i foo2]
    [i foo3]
]

Would the common user naturally expect the HTML to look like

<p>foo1 <i>foo2</i> <i>foo3</i></p>

or like

<p>foo1 <i>foo2</i> <i>foo3</i> </p>
                               ^

? After all, the [], like the {} in other languages, lend themselves to being perceived as blocks in the code itself (whether or not the node renders as block or inline), and so a user might expect the inner contents of, say,

   The telescope is, [i
     beyond all dispute
   ], mine.

to map to this shape of structure in the document object model:

   |-- paragraph
       |-- text("The telescope is, ")
       |-- italic
       |   |-- text("beyond all dispute")
       |-- text(", mine.")

not to this:

   |-- paragraph[
       |-- text("The telescope is, ")
       |-- italic
       |   |-- text("beyond all dispute ")
       |                               ^
       |-- text(", mine.")

Hence, I think all "edge space" on the extremes of a node's content should be trimmed unless given via [sp] or some other way. Then, Christian, you might be somewhat alleviated from having to disclaim the exception about the ignored space at the beginning of the node, for it would be at the end of the node too, so that the whole inner content of any node is trimmed of leading and trailing collapsible (breaking, not non-breaking) whitespace, unless the node's nature is otherwise, like for the hypothetical [pre].

Otherwise, how could one remove that space between the last inner element and the outer node's ] when not wanted? You might then have to resort to some whitespace collapsing feature, similar to the hyphens or tildes in Jinja, Helm, or whatever templating language, but that would only sophisticate, in my opinion.

I am interested to hear counterarguments, however, in favor of supporting such cases as whatever  or whatever  whenever those are desired instead of, say, whatever  or whatever .

pxml-lang commented 3 years ago

Ideally, the rule for 'whitespace handling' should be (1) simple, (2) intuitive, and (3) applied consistently. The current rule is: "A set of one or more consecutive whitespace characters (i.e. spaces, tabs, and new lines) is replaced by a single space." I think that this rule works well in idiomatic PML code, where inline nodes are used inline, and new lines are only used at places where a space is acceptable, i.e.:

This is a long sentence written on
two lines.

As soon as one adds vertical whitespace (new lines) in unconventional ways, the result might be unintuitive, and surprise users who are not aware of the rule. I tried different rules, but I was not able to find a good rule that satisfies the 3 conditions (simple, intuitive, applied consistently) and works well in all cases, such as the cases you mentioned.

Suggestions for a better rule are very welcome.

you are speaking of a space between two inner elements inside a node, whereas my example is of a space between the last inner element and the end of a node.

Yes, but the rule is the same in both cases.

a user might expect the inner contents of, say,
The telescope is, [i
beyond all dispute
], mine.

In this case, the user could write it more idiomatically, like this:

The telescope is, [i beyond all dispute], mine.

or maybe this:

The telescope is,
  [i beyond all dispute],
mine.

However, if the user has a really good reason to use the first version, then he/she could use a comment to avoid the extra space, e.g.:

The telescope is,[i
  beyond all dispute[-
-]], mine.

Of course, this is a really ugly hack. Maybe we could add a dedicated ignore node (its content is simply ignored) to cover rare corner cases, but I'm not sure that this is a good idea.

Whitespace handling can be surprisingly challenging. Interesting link: How whitespace is handled by HTML, CSS, and in the DOM

pml-lang commented 3 years ago

In version 2.0.0 chapter Whitespace Handling has been added to document the behavior.

pml-lang / pml-companion

Whitespace not collapsed when line break precedes node end #51