metanorma / metanorma-gb

Metanorma processor for GB: write Chinese Standards using GbDoc
BSD 2-Clause "Simplified" License
2 stars 3 forks source link

Fix sector document issues #26

Closed ronaldtse closed 6 years ago

ronaldtse commented 6 years ago

I've updated https://github.com/riboseinc/gmt-0009-2012 to show the original text of GM/T 0009-2012, the point is to recreate it in the GM standard format.

Currently there are minor issues like:

ronaldtse commented 6 years ago

Once I fixed the "[en] language missing" for T&D sections, most of these problems are gone. We need to ensure that these parts don't go missing even when T&D content is broken?

Remaining issues:

opoudjis commented 6 years ago

So, document was breaking when English language was missing in T&D; that's surprising.

*(本稿完成日期:2018年1月): boilerplate found in the original document that I'm now stripping out. I will put it back in if it will be anchored to created-date in bibdata (or, more to the point, last updated-date)

opoudjis commented 6 years ago

Error in your markup, and it's not being caught in validation properly (because RNC text can be empty): What you have marked up as the title-intro is the title-main. The title-main is mandatory, the title-intro is optional. A one-phrase title is supposed to use title-main, not title-intro.

https://github.com/riboseinc/asciidoctor-iso/issues/106 to ensure empty strings such as title-main are not generated.

opoudjis commented 6 years ago

the line * 为 表示"验证通过",为 表示"验证不通过"。 is rendered as 为 真 表示"#x9A8C;#x8BC1;#x901A;#x8FC7; ",为 假 表示"#x9A8C;#x8BC1;#x4E0D;#x901A;#x8FC7; "。

There are ` that have made it into the wild in your document unescaped, and Html2Word is assuming them to be AsciiMath delimiters; it's therefore attempting to render the text as OOML maths, and getting it wrong. https://github.com/riboseinc/isodoc/issues/31 to generate correct delimiters.

opoudjis commented 6 years ago

6.2: problem character is Unicode ellipse 6.3: problem character is Unicode en-dash (did you mean minus?)

Will attempt decoding Unicode entities before passing them into AsciiMath processors.

opoudjis commented 6 years ago

Decoding the Unicode entities addresses 6.2, 6.3. 6.4 with the box is still open.

opoudjis commented 6 years ago

The box is somehow the MathML translation to Word (using Word's own stylesheet) being mangled. Cutting and pasting the MathML into Word does the same mangling, whereas the MathML rendered online is fine. If you go to Linear in the equation in Word, and retype it, Word inserts a space between the Sum and the thing being summed, and it works out. But you have to retype it: just deleting the space blows the equation up.

The dotted square is Word complaining that it's missing a parameter; I've compared what Word expects as OOXML and the OOXML it's got, and what's missing is m:sSupPr, which is a wrapper for superscript properties (because somehow the 2 is an m:sup). I don't see how it's a parameter, and how editing the AsciiMath or MathML would force that m:sSupPr to be supplied.

I've spent an hour on this, and I don't see a clean way forward. For non-trivial AsciiMath and MathML, you may have to edit the Word document: the translation from MathML to Word is imperfect; and given that you can copy paste MathML into Word, this has to be a known issue, that we aren't going to be the ones to solve.

opoudjis commented 6 years ago

I've fixed as much of this as I can:

frontpage: the "draft" message about "2018" shouldn't be shown

Removed. If you want it put back in for known created date, let me know.

Title should not have "SM2密码算法使用规范-" ending dash

Changed markup: moved title-intro to title-main. Will change XML generation so it does not generate empty elements, and the validation complains about the missing element.

the line * 为 表示"验证通过",为 表示"验证不通过"。 is rendered as 为 真 表示"#x9A8C;#x8BC1;#x901A;#x8FC7; ",为 假 表示"#x9A8C;#x8BC1;#x4E0D;#x901A;#x8FC7; "。

Fixed by changing the Asciimath delimiter.

7.2, 7.3, 7.4 Math equations shows a Unicode character number within. In 7.4, a space in between is rendered as a box.

Fixed for 7.2, 7.3, by decoding Unicode escapes in Asciimath. Can't fix 7.4: there is something broken about how Word translates sums into OOXML, and we will have to warn users about it.

ronaldtse commented 6 years ago

Thank you @opoudjis for the markup clarification and fixes -- I agree that the OOXML Math issue should just be considered unfixable for now. I wonder if we should could file this bug in the html2doc repo so one day someone could work on it.

opoudjis commented 6 years ago

I couldn't leave well enough alone :-( . I've found a fix: a Sum (munderover) followed by an Exponential (msup) in MathML need to wrap the exponential in mrow, for Word not to complain. I don't know why, but I'll put the fix in.

ronaldtse commented 6 years ago

Wow, it fees like AsciiMath to MathML should be a separate gem! :wink: