prettier / plugin-xml

Prettier XML plugin
MIT License
234 stars 33 forks source link

Whitespace formatting isn't valid and idempotent with `ignore` sensitivity #768

Closed gebsh closed 7 months ago

gebsh commented 8 months ago

In XML, only \t, \n, \r, and ` are considered [whitespace](https://www.w3.org/TR/xml/#NT-S) and are affected by the [xml:spaceattribute](https://www.w3.org/TR/xml/#sec-white-space). However, when formatting an XML document with thexmlWhitespaceSensitivityoption set toignore,@prettier/plugin-xmluses [String.prototype.trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) to remove whitespace characters, which results in removal of text that should be preserved.

https://github.com/prettier/plugin-xml/blob/68b3430186d6b9bfda86f683b97694492825bb3d/src/printer.js#L281-L288

For example, this document has a <text> element with 4 trailing U+00A0 No-Break Space characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo    </text>
  <text>bar</text>
</paragraph>

Formatting it removes these 4 trailing characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo</text>
  <text>bar</text>
</paragraph>

Due to this behavior, formatting of documents containing elements that only have non-breaking spaces causes the output to be different depending on how many formatting runs are performed. Given this input:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
</paragraph>

This is the output after formatting the input once:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text></text>
</paragraph>

And this is the output after formatting it twice:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text />
</paragraph>

Here's a list of affected characters:

And an XML document that has each of these characters repeated 4 times in separate <text> elements:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>



</text>
  <text>



</text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text></text>
</paragraph>