kristian / minify-xml

Fast XML minifier / compressor / uglifier with a command-line
Other
15 stars 5 forks source link

minify-xml

minify-xml is a lightweight and fast XML minifier for NodeJS with a command line.

Existing XML minifiers, such as pretty-data often do a pretty (phun intended) bad job minifying XML in usually only removing comments and whitespace between tags. minify-xml on the other hand also includes minification of tags, e.g. by collapsing the whitespace between multiple attributes and further minifications, such as the removal of unused namespace declarations. minify-xml is based on regular expressions and thus executes blazingly fast.

Online

Use this package online to minify XMLs in your browser, visit:

Minify-X.ML (https://minify-x.ml/)

Installation

npm install minify-xml -g

Usage

import minifyXML from "minify-xml";

const xml = `<Tag xmlns:used = "used_ns" xmlns:unused = "unused_ns">
    <!--
        With the default options all comments will be removed, whitespace in
        tags, like spaces between attributes, will be collapsed / removed and
        elements without any content will be collapsed to empty tag elements
    -->
    <AnotherTag  attributeA  =  "..."  attributeB  =  "..."  >  </AnotherTag  >

    <!--
        Also any unused namespaces declarations will be removed by default,
        used namespaces however will be shortened to a minimum length possible
    -->
    <used:NamespaceTag  used:attribute  =  "..."  >
        any valid element content is left unaffected (strangely enough = " ... "
        and even > are valid characters in XML, only &lt; must always be encoded)
    </used:NamespaceTag  >

    <![CDATA[<FakeTag attr = "content in CDATA tags is not minified"></FakeTag>]]>
</Tag>`;

console.log(minifyXML(xml));

This outputs the minified XML:

<Tag xmlns:u="used_ns"><AnotherTag attributeA="..." attributeB="..."/><u:NamespaceTag u:attribute="...">
        any valid element content is left unaffected (strangely enough = " ... "
        and even > are valid characters in XML, only &lt; must always be encoded)
    </u:NamespaceTag><![CDATA[<FakeTag attr = "content in CDATA tags is not minified"></FakeTag>]]></Tag>

Alternatively a Node.js Transform stream can be provided to minify XML streams, which is especially helpful for very large files (> 2 GiB, which is the maximum Buffer size in Node.js on 64-bit machines):

import { minifyStream as minifyXMLStream } from "minify-xml";

fs.createReadStream("sitemap.xml", "utf8")
    .pipe(minifyXMLStream())
    .pipe(process.stdout);

Similar to streams, Node.js 15 introduced an asynchronous stream.pipeline API that with stream/promises utilizes promises. This way you can utilize the advantages of the streaming API (namely no file size limit) in conjunction with the convenience of using a modern promise based API:

import { minifyPipeline as minifyXMLPipeline } from "minify-xml";

await minifyXMLPipeline(fs.createReadStream("catalogue.xml", "utf8"), process.stdout, { end: false });

Options

You may pass in the following options when calling minify:

import { minify as minifyXML, minifyStream as minifyXMLStream } from "minify-xml";
minifyXML(`<tag/>`, { ... });
minifyXMLStream({ ... });

For stream processing following additional options can be supplied:

Stream Limitations

Note that the default streamMaxMatchLength was deliberately chosen as high as a multiple of the Node.js default stream buffer size (the default buffer size for readable streams is 16 KiB, for file system streams it is 64 KiB), as the stream option is specifically meant to be used with very large files / read streams and a larger streamMaxMatchLength will result in a more accurate minification, because some very large tags might require to be read into the buffer all at once to be minified.

On 32-bit machines the maximum buffer size in Node.js is 1 GiB and 2 GiB on 64-bit machines (see this issue). Minify XML can handle strings up to that size and using the minify function should be preferred over the minifyStream option. For larger files / streams the streaming API has to be used, which comes with certain limitations, because no prior knowledge can be obtained for the minification (mainly because we assume we can read the stream only once, an option to obtain the required information by e.g. first parsing a file and then minifying it might be added some time in future). For now the options removeUnusedNamespaces, removeUnusedDefaultNamespace, shortenNamespaces and ignoreCData cannot be used with the streaming API and calling the minifyStream function with these options enabled, will result in an error.

Further multiple buffers of the set size, will be created for each minification option enabled (sometimes a minification requires even multiple buffers / replacements). Thus enabling more options will also allocate more memory depending on the streamMaxMatchLength option and in case the file / read stream is generally larger than the buffer size set. As the input will be pumped through all minification as a stream, roughly 1.5 * n * buffer size will get allocated. E.g. the default buffer size of 256 KiB with all default options enabled for streaming, will for instance result in 11 buffers / replacements to be made, so 11 * 256 KiB = 2.75 MiB is to be allocated if the input stream is 256 KiB or larger.

CLI

You can run minify-xml from the command line to minify XML files:

minify-xml sitemap.xml
minify-xml blog.atom --in-place
minify-xml view.xml --output view.min.xml
minify-xml db.xml --stream > out.xml

Use any of the options above like:

minify-xml index.html --collapse-whitespace-in-texts --ignore-cdata false

Author

XML minifier by Kristian Kraljić. Original package and CLI by Mathias Bynens.

Bugs

Please file any issues on Github.

License

This library is dual licensed under the MIT and Apache 2.0 licenses.