metanorma / metanorma-iso

Metanorma processor for ISO standards
BSD 2-Clause "Simplified" License
13 stars 5 forks source link

Annotation rendering in HTML #123

Open opoudjis opened 6 years ago

opoudjis commented 6 years ago

The asciidoctor-iso gem converts Asciidoctor into Word (via Microsoft HTML) and generic HTML. The Word documents it generates include Word comments.

Like all Microsoft HTML, the way comments are indicated in Microsoft HTML is fairly ugly:

        <span style="MsoCommentReference" target="1" class="commentLink" from="foreword" to="foreword">
          <span lang="EN-GB" style="font-size:9.0pt" xml:lang="EN-GB">
            <a style="mso-comment-reference:SMC_1;mso-comment-date:20170101T0000"><p id="foreword">ISO (the International Organization for Standardization)
is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.</p></a>
            <span style="mso-special-character:comment" target="1"></span>
          </span>
        </span>

....

      <div style="mso-element:comment-list"><div style="mso-element:comment" id="1"><span style="mso-comment-author:&quot;ISO&quot;"></span><p class="MsoCommentText" id="_c54b9549-369f-4f85-b5b2-9db3fd3d4c07"><span style="MsoCommentReference"><span lang="EN-GB" style="font-size:9.0pt" xml:lang="EN-GB"><span style="mso-special-character:comment"></span></span></span>A Foreword shall appear in each document. The generic text is shown here. It does not contain requirements, recommendations or permissions.</p>
<p class="MsoCommentText" id="_f1a8b9da-ca75-458b-96fa-d4af7328975e">For further information on the Foreword, see <b>ISO/IEC Directives, Part 2, 2016, Clause 12.</b></p></div></div>
    </div>

You can see the live example of this in spec/examples/rice.doc of this gem (the HTML in there is readable). The critical bits are: span style="MsoCommentReference", a style="mso-comment-reference, and the contents of div style="mso-element:comment-list. The code that generates these successfully is https://github.com/riboseinc/isodoc/blob/master/lib/isodoc/notes.rb

Word understands these Comments, and even deals with them highlighting across a vast range of divs and spans (thanks to poor HTML validation :-) . I don't have unit tests for it yet, but I can supply a couple of docs showing it.

HTML as rendered on a browser, of course, has no idea what to make of this markup: it ignores the comment reference and highlighting, and leaves the comment dumped at the bottom of the document.

We would like the HTML output of the gem to render something that looks like annotations. It should highlight from the beginning to the end of the commented span; it must indicate the start of the commented span if it can't do the highlighting; it must show the commenter, date, and comment text somehow—whether by click or mouseover.

The solution can use CSS or Javascript, but should be able to work with a standalone HTML document, preferably offline.

The realistic option is to convert the ISO XML input for review comments into HTML that does annotations. It will do what https://github.com/riboseinc/isodoc/blob/master/lib/isodoc/notes.rb does, but it will output HTML that is usable in a browser instead of HTML that only Word understands. The solution will be embedded into the isodoc gem.

The ideal option would be to come up with a standard clean way of indicating annotations in the HTML generated by isodoc, that can then be converted into both Word HTML and normal HTML output—something compliant with https://www.w3.org/TR/annotation-html/ , https://www.w3.org/TR/selectors-states/ . That's the assumption behind https://github.com/riboseinc/html2doc/issues/6 (which expects clean HTML input, and converts it into Word HTML.) That option would be ideal, since it would be a logically clean starting point; but the W3C recommendations look pretty impractical (<q> links? Embedded microformats?)

ronaldtse commented 6 years ago

Thanks @opoudjis -- just to clarify, the goal is to convert an IsoDoc document into HTML that supports annotations (i.e. IsoDoc2HTML). Right?

opoudjis commented 6 years ago

Yes, and the code will be Incorporated into the HTML.rb model of ISODoc. I don't believe it warrants a separate gem.

ronaldtse commented 6 years ago

@opoudjis with the latest stylesheet structure, how does it change the description of this work? Thanks!

opoudjis commented 6 years ago

It's not the stylesheet that would change it, so much as having the html generator inherited rate than transforming existing word html: you can generate the html annotations straight from the ISO XML Will change description

opoudjis commented 6 years ago

Wait on https://github.com/riboseinc/isodoc/issues/49

opoudjis commented 6 years ago

https://github.com/riboseinc/isodoc/issues/49 done.