riboseinc / isodoc-docx

Extension of isodoc to deal with docx
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Create gem #1

Open opoudjis opened 6 years ago

opoudjis commented 6 years ago

@enlarsen This issue is for you. @ronaldtse Erik is not a member of the organisation, so it's not clear to me whether he's receiving this notification.

Currently the Metanorma suite of gems converts Asciidoctor input into an intermediate Metanorma XML, capturing the semantic content of standards documents, and converts that Metanorma XML into HTML and Word HTML (Doc). The base gem doing the latter conversion is isodoc, and each of the metanorma-* gems customises the isodoc mapping to match the requirements of specific standards. They also make extensive use of HTML CSS stylesheets and Word HTML CSS stylesheets.

https://github.com/riboseinc/metanorma-iso/blob/master/docs/customisation.adoc is as much of a walkthrough of the metanorma gem structure as there is. For the isodoc gem in particular, the current outputs, HTML and Word HTML, are so closely related that they have been formulated as two classes inheriting from a common Convert class.

The task is to generate DOCX output from Metanorma XML. If successful, this will be a third class in the isodoc gem, alongside the HtmlConvert and WordConvert gems, and it will be customised downstream by metanorma-* gems.

I am the maintainer of the metnorma gem stack. I have resisted doing this work myself, for reasons outlined in https://github.com/riboseinc/html2doc/wiki/Why-not-docx%3F. That document also outlines the formatting requirements imposed by the Metanorma stack, and they are prodigious, as the sample documents Ronald will have sent you illustrate. And, as documented there, those documents also appear to defeat the DOCX MHT approach, which only appears to understands HTML 5 in its online version, and does not understand the Word HTML customisations critical to our stack, such as mathematical formatting or headers and footers.

If you know how to introduce that functionality into MHT, that would be a far preferable outcome to coding DOCX from scratch, since it would permit us to stick with an HTML core, and would be far more maintainable. I'm also concerned that if DOCX does not allow us to use the equivalent of external stylesheets, and all styling has to be coded, maintaining the daughter gems is going to be very fragile.

In the first instance, become familiar with the Metanorma XML format (https://github.com/riboseinc/metanorma-model-standoc; @ronaldtse can give you a writeup we have done of what's in it), and the structure of the isodoc gem. Based on that, evaluate how feasible a DOCX (or DOCX HTML) equivalent to isodoc would be, and what the timeframe would be for creating it. Maintainability will also be a concern to evaluate.

opoudjis commented 6 years ago

See https://github.com/trade-informatics/caracal

My reservation with any SDK approach is, if it doesn't cover the full spec (and NOONE wants to cover the 7500 pp of the full spec), then extending the SDK to cover missing features is going to be a lot of work (they reverse-engineer OOXML, just as I reverse-engineer Word HTML). I note they don't discuss footnotes in their Readme, for example, which suggests they haven't implemented them yet.