kristian / minify-xml

Fast XML minifier / compressor / uglifier with a command-line
Other
17 stars 5 forks source link

spaces are removed inside tags by default even when xml:space is set to "preserve" #13

Closed tiholic closed 3 years ago

tiholic commented 3 years ago

using minifyXML(xml, {removeWhitespaceBetweenTags: 'strict', collapseEmptyElements: false}) is removing whitespace between elements.

input:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:t xml:space="preserve">    </w:t>
  <w:t xml:space="preserve">

        </w:t>
  <w:t xml:space="preserve">  </w:t>
  <w:t xml:space="preserve"> </w:t>
  <w:t xml:space="preserve"></w:t>
</w:document>

output:

'<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t xml:space="preserve"></w:t><w:t xml:space="preserve"></w:t><w:t xml:space="preserve"></w:t><w:t xml:space="preserve"></w:t><w:t xml:space="preserve"></w:t></w:document>'

expected:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t xml:space="preserve">    </w:t><w:t xml:space="preserve">

        </w:t><w:t xml:space="preserve">  </w:t><w:t xml:space="preserve"> </w:t><w:t xml:space="preserve"></w:t></w:document>
kristian commented 3 years ago

Currently minify-xml does not consider the xml:space="preserve" attribute, when removing whitespace between tags. I even think this should be respected even if removeWhitespace is set to true. Thus I consider this a good first issue! However as according to the XML namespace spec. [1] the xml:space attribute is not mandatory and acts as an intent, thus minify-xml still acts according to spec. and I would like to deal with this issue as an enhancement, rather than a bug.

Let me see what I can do, maybe until end of week. Feel free to suggest any PR before that.

Thanks for filing this issue!

[1] https://www.w3.org/TR/xml-names/

kristian commented 3 years ago

@tiholic turns out, due to the regexp-based nature of minify-xml this one is again not possible to do fully spec. compliant. I have added basic support for this with f4f3626 and version 3.3.1. I decided to not use the removeWhitespaceBetweenTags flag, but the trimWhitespaceFromTexts and collapseWhitespaceInTexts, as they have been non-spec. compliant beforehand and need to be enabled explicitly. The limitation comes in when tags are used inside of elements with a xml:space declaration. Thus:

<tag xml:space="preserve">  hello  <br/>  world   </tag>

With options:

{
    "removeWhitespaceBetweenTags": false,
    "considerPreserveWhitespace": true,
    "trimWhitespaceFromTexts": true,
    "collapseWhitespaceInTags": true
}

Should result in the same output, so:

<tag xml:space="preserve">  hello  <br/>  world   </tag>

However, due to regular expressions have no knowledge about the DOM, it will actually result in:

<tag xml:space="preserve">  hello  <br/>world</tag>

This limitation cannot be overcome with the current regular expression-based engine of minify-xml. I am planning a 4.0 release of minify-xml which will switch from using regular expressions, to a parser-based approach. With that, more spec. compliant minifications will get possible.

However I hope adding basic support for considering the xml:space and pre tags satisfies your use case. With the options listed above (so removeWhitespaceBetweenTags false and trimWhitespaceFromTexts & collapseWhitespaceInTags true), the minification should turn out as you expected them.

Hope this helps!