metanorma / mnconvert

Metanorma converter
2 stars 1 forks source link

Performance through parallization #92

Open ronaldtse opened 2 years ago

ronaldtse commented 2 years ago

mnconvert currently only uses a single thread on a computer. This causes mnconvert to run slowly on generating PDFs from large XMLs, e.g. https://github.com/metanorma/iso-10303-2.

We should parallize mnconvert to run on modern computers.

Intelligent2013 commented 2 years ago

mnconvert uses xalan xslt processor, that doesn't support parallelization. Michael Kay (developer of Saxon) noted in https://www.saxonica.com/papers/xmlprague-2015mhk.pdf, there are some commercial xslt processor that support multi-threading:

In the commercial domain, there are high-end XSLT processors from IBM and
Intel, marketed as hardware-assisted XSLT accelerators, which may well make use
of parallel processing internally, but if so, no details are available in the public do-
main. “Altova’s marketing literature for RaptorXML intriguingly claims "the engine
takes advantage of today’s ubiquitous multi-CPU computers to deliver lightning
fast processing of XML and XBRL data"; but it is hard to ind any technical details
on how it does so.

also Saxon EE (Enterprise Edition) supports Multi-threaded <xsl:for-each> and <xsl:apply-templates>.

There is only one way to speed up - xslt profiling and optimization. I've optimized mn->sts xslt in the https://github.com/metanorma/mnconvert/commit/cb25a0a2cffb6cba959fa5b1b913e892d2c7a151 Now, iso-10303-2 metanorma xml converts to sts xml in 107sec. vs. 479sec. before.