Closed Intelligent2013 closed 5 months ago
Metanorma Adoc: NISO-STS-Standard-1-0.adoc.zip
The issues:
[x] include::sections/08-ansi/niso.adoc[]
- /
should be converted to _
.
[x] :date: release Approved: October 6, 2017
- this format isn't supported by Metanorma
Source XML:
<release-date date-type="approved" iso-8601-date="2017-10-06">Approved: October 6, 2017</release-date>
~I.e. release-date
contains the presentation data, instead of 2017-10-06
(https://www.niso-sts.org/TagLibrary/niso-sts-TL-1-2-html/element/release-date.html). I don't want to parse this data by XSLT. Need adapt in adoc manually after conversion.~
[x] STS: Standards Tag Suite
from the XML
<title-wrap>
<main-title-wrap>
<main>STS: Standards Tag Suite</main>
</main-title-wrap>
</title-wrap>
[x] the sequence of the resulted adoc
include::sections/04-application.adoc[]
include::sections/02-normrefs.adoc[]
include::sections/06-references.adoc[]
~based on Metanorma requirements, And need to be adapted manually after conversion.~
- [x] from `07-definitions.adoc `:
```adoc
[[sec_7]]
== Definitions
ANSIAmerican National Standards Institute. ANSI is a private nonprofit organization that oversees the development of voluntary consensus standards in the U.S.
ASCIIASCII is a character-encoding scheme based on the English alphabet. It defines the encoding for 128 characters including A-Z, a-z, 0-9, some symbols, and some control characters.
from XML:
<sec id="sec_7" sec-type="definitions">
<label>7</label>
<title>Definitions</title>
<term-sec>
<term-display>
<term>ANSI</term>
<def>
<p>American National Standards Institute. ANSI is a private nonprofit organization that oversees the development of voluntary consensus standards in the U.S.</p>
</def>
</term-display>
</term-sec>
<term-sec>
<term-display>
<term>ASCII</term>
<def>
<p>ASCII is a character-encoding scheme based on the English alphabet. It defines the encoding for 128 characters including A-Z, a-z, 0-9, some symbols, and some control characters.</p>
<p>It is sometimes used to refer to “plain text,” i.e., text without special characters or equations.</p>
</def>
</term-display>
</term-sec>
mnconvert doesn't know about this practice of using term-sec/term-display/term|dec
. I'll add the conversion rules.
Metanorma XML: NISO-STS-Standard-1-0.mn.zip
The issues:
bibdata
:
<?xml version="1.0" encoding="UTF-8"?><iso-standard xmlns="https://www.metanorma.org/ns/iso" type="presentation">
<preface>
<foreword type="foreword" displayorder="1">
Notes:
<front>
<std-doc-meta>
...
<std-ident>
...
<std-id-group std-relationship-type="std-as-published">
<std-id std-id-type="undated">ANSI/NISO Z39.102</std-id>
<std-id std-id-type="dated">ANSI/NISO Z39.102-2017</std-id>
</std-id-group>
<isbn publication-format="HTML">978-1-937522-77-3</isbn>
<isbn publication-format="PDF">978-1-937522-78-0</isbn>
<issn specific-use="National Information standards series">1041-5653</issn>
</std-ident>
<std-org-group>
<std-org std-org-role="developer">
<std-org-name>National Information Standards Organization</std-org-name>
<std-org-abbrev>NISO</std-org-abbrev>
</std-org>
<std-org>
<std-org-loc>
<addr-line>NISO</addr-line>
<addr-line>3600 Clipper Mill Road</addr-line>
<addr-line>Suite 302</addr-line>
<addr-line>Baltimore, MD 21211-1948</addr-line>
<ext-link ext-link-type="uri" xlink:href="https://www.niso.org">www.niso.org</ext-link>
</std-org-loc>
</std-org>
</std-org-group>
<content-language>en</content-language>
<std-ref>ANSI/NISO Z39.102-2017</std-ref>
...
<accrediting-organization accredit-acronym="ANSI">American National Standards Institute</accrediting-organization>
<authorization authorize-acronym="ANS">An American National Standard</authorization>
<permissions>
<copyright-statement>Copyright © 2017 by the National Information Standards Organization</copyright-statement>
<license>
<license-p>All rights reserved under International and Pan-American Copyright Conventions. For noncommercial purposes only, this publication may be reproduced or transmitted in any form or by any means without prior permission in writing from the publisher, provided it is reproduced accurately, the source of the material is identified, and the NISO copyright status is acknowledged. All inquiries regarding translations into other languages or commercial reproduction or distribution should be addressed to: NISO, 3600 Clipper Mill Road, Suite 302, Baltimore, MD 21211-1948.</license-p>
</license>
</permissions>
<abstract>
<title>Abstract:</title>
<p>The Standards Tag Suite (STS) provides a common XML format that developers, publishers, and distributors of standards, including national standards bodies, regional and international standards bodies, and standards development organizations can use to publish and exchange full-text content and metadata of standards. STS is based on ANSI/NISO Z39.96 (JATS). Structures are provided to encode both the normative and non-normative content of: standards, adoptions of standards, and standards-like documents that are produced by standards organizations.</p>
</abstract>
<meta-note content-type="cover-address">
<p>Published by the National Information Standards Organization</p>
<p>Baltimore, Maryland, U.S.A.</p>
</meta-note>
<meta-note content-type="title-page">
<title>About NISO Standards</title>
<p>NISO standards are developed by Working Groups of the National Information Standards Organization under the oversight of a Topic Committee. The development process is a strenuous one that includes a rigorous peer review of proposed standards open to each NISO Voting Member and any other interested party. Final approval of the standard involves verification by the American National Standards Institute that its requirements for due process, consensus, and other approval criteria have been met by NISO. Once verified and approved, NISO Standards also become American National Standards.</p>
<p>These standards may be revised or withdrawn at any time. For current information on the status of this standard contact the NISO office or visit the NISO website at: <ext-link ext-link-type="uri" xlink:href="https://www.niso.org">https://www.niso.org</ext-link>
</p>
</meta-note>
</std-doc-meta>
@ronaldtse what is the main goal of the conversion 'NISO STS XML tagged version of the standard' into Metanorma Adoc and XML? Do we need round trip conversion?
@Intelligent2013 we want to be able to handle the elements produced in the sample in Metanorma. We don't need a round trip.
The attribute :docnumber: Z39.102
causes the error:
bundle exec metanorma -t iso -x presentation test.adoc
Fatal Error: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 1.
|- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 1.
| `- Expected " ", but got "Z" at line 1 char 1.
|- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 1.
| `- Expected at least 1 of \\d at line 1 char 1.
| `- Failed to match \\d at line 1 char 1.
`- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 1.
`- Extra input after last repetition at line 1 char 1.
`- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 1.
`- Expected " + ", but got "Z39" at line 1 char 1.
C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/pubid-core-1.12.5/lib/pubid/core/identifier/base.rb:166:in `rescue in parse': Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1. (Pubid::Core::Errors::ParseError)
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 1.
|- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 1.
| `- Expected " ", but got "Z" at line 1 char 1.
|- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 1.
| `- Expected at least 1 of \\\\d at line 1 char 1.
| `- Failed to match \\\\d at line 1 char 1.
`- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 1.
`- Extra input after last repetition at line 1 char 1.
`- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 1.
`- Expected " + ", but got "Z39" at line 1 char 1.
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/pubid-core-1.12.5/lib/pubid/core/identifier/base.rb:161:in `parse'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:60:in `orig_id_parse'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:47:in `iso_id_params'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:37:in `iso_id'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:13:in `metadata_id'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-standoc-2.8.7/lib/metanorma/standoc/front.rb:168:in `metadata'
from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-standoc-2.8.7/lib/metanorma/standoc/base.rb:117:in `block in front'
...
[x] wrong named-content
conversion from
<list-item>
<p>
<named-content content-type="organization">American Library Association (ALA)</named-content>
</p>
<p>
<named-content content-type="committee-member-name">Jill Emery</named-content>
</p>
</list-item>
to
{{American-Library-Association--ALA-,American Library Association (ALA)}}
[x] 00-01-std-doc-meta.adoc
contains redundant data (document attributes):
:title-main-en: STS: Standards Tag Suite
= Z39.102
National Information Standards Organization:publisher: NISO
:pub-address: NISO + \
3600 Clipper Mill Road + \
Suite 302 + \
Baltimore, MD 21211-1948 + \
www.niso.org
enANSI/NISO Z39.102-20172017-10-06:semantic-metadata-accrediting-organization: American National Standards Institute
:authorizer: An American National Standard, ANS
:semantic-metadata-copyright-statement: Copyright © 2017 by the National Information Standards Organization
:semantic-metadata-license: All rights reserved ...
[.preface,type=cover-address] == {blank}
Published by the National Information Standards Organization
- [x] wrong bibitem markup in '06-references.adoc`:
[[ref_6]] [%bibitem] === Access License and Indicators, 5 January 2015. https://www.niso.org/apps/group_public/download.php/14226/rp-22-2015_ALI.pdf[https://www.niso.org/apps/group_public/download.php/14226/rp-22-2015_ALI.pdf] docid:: id::: NISO RP-22-2015 type:: standard
- [x] missing `[bibliography]` in '06-references.adoc`
named-content
, for instance:
<named-content content-type="element-name"><contrib></named-content>
should be converted to semantic spans span:category[text]
(https://www.metanorma.org/author/topics/inline_markup/text_formatting/#semantic-spans), i.e.:
span:element-name[<contrib>]
Add the official NISO STS XML https://www.niso-sts.org/downloadables/samples/NISO-STS-Standard-1-0.XML (from the page https://www.niso-sts.org/Samples.html) into ~JUnit tests~ Makefile.