metanorma / metanorma-ietf

Metanorma processor for IETF documents
BSD 2-Clause "Simplified" License
6 stars 5 forks source link

Tool to convert RFC XML v2 and v3 to Metanorma Asciidoc #16

Closed ribose-jeffreylau closed 3 years ago

ribose-jeffreylau commented 5 years ago

In CalConnect, there's interest in a tool that supports converting from RFC XML v2 to Metanorma Asciidoc. Having such a tool would help transition existing documents already in RFC XML to Metanorma.

opoudjis commented 5 years ago

Comparable existing tools:

rfc2md would be an appropriate starting point for this project. It is, in retrospect, somewhat naive about RFC XML v2 structure, and it is in XSLT, which means only a mother could love it. But the existing Word-to-Asciidoctor and Word-to-Markdown tools are also imperfect, and that does not get in the way of them being useful.

This comment is in no way me volunteering to work on this: I have more than enough to do on the Metanorma stack. But customising rfc2md to align with Metanorma-IETF markup is a task someone else can pick up.

opoudjis commented 4 years ago

This would now need to become v3 XML.

ronaldtse commented 3 years ago

@Intelligent2013 do you have time for this? Thanks!

Intelligent2013 commented 3 years ago

@ronaldtse Yes, I have a time.

I understand this task so:

Source RFC format:

Destination Metanorma Asciidoc IETF format: https://datatracker.ietf.org/doc/draft-ribose-asciirfc/?include_text=1

Roadmap:

  1. Prepare conversion rules for converstion data from RFC XML v2 to Metanorma Asciidoc (docx file).
  2. Development converter based on the conversion rules.
  3. Update conversion rules for RFC XML v3.
  4. Update converter for RFC XML v3.

Starting point for converter: https://github.com/riboseinc/rfc2md

Examples for testing: Source RFC XML Examples: https://github.com/metanorma/mn-samples-ietf/tree/gh-pages/documents Expected result: https://github.com/metanorma/mn-samples-ietf/tree/master/sources

Any suggestions or comments?

opoudjis commented 3 years ago

Oh wooooooow...

@Intelligent2013 https://tools.ietf.org/html/rfc7991 is not the only reference you should be using. RFC XML v3 has been laughably unstable over the past couple of years, although you might argue it has started to settle down now. (It was developed a couple of years before anyone implemented it, and the implementation brought about a couple more years of tinkering.) But you will need to also consult:

ronaldtse commented 3 years ago

Other than the specs the key point is to find a stash of actual RFC XML v2/v3 documents. I think datatracker.ietf.org may have something like this?

Intelligent2013 commented 3 years ago

Actual RFC XML v3 documents can be founded via https://tools.ietf.org/rfc/. But they are available from RFC8650 (November 2019). The document on https://datatracker.ietf.org/doc/rfc8650/ has link to xml on site https://www.rfc-editor.org/rfc/rfc8650.xml. In any case all V3 XMLs founded here: https://www.rfc-editor.org/rfc/ (also from 8650), and ~240 xmls there. For V2 I didn't find actual xmls.

ronaldtse commented 3 years ago

Let's use the rfc2mn repository to publish this software. Thanks! https://github.com/metanorma/rfc2mn

Intelligent2013 commented 3 years ago

@ronaldtse How rfc2mn should convert RFC XML v2:

ronaldtse commented 3 years ago

How rfc2mn should convert RFC XML v2:

  • RFC XML v2 -> Metanorma AsciiRFC v2, or
  • RFC XML v2 -> Metanorma AsciiRFC v3 ?

Both versions should be converted to Metanorma AsciiRFC v3, so that people can create new versions of the published RFCs.

Intelligent2013 commented 3 years ago

I have a few questions about mapping rules between some RFC XML elements/attributes and Metanorma AsciiRFC markup:

  1. in Metanorma AsciiRFC markup, is the attributes (key=value) order important? For example, can we put numbered,removeInRFC,toc :
    [numbered=false,removeInRFC=false,toc=exclude]
    == Status of This Memo

    instead of removeInRFC,toc,numbered ?:

    [removeInRFC=false,toc=exclude,numbered=false]
    == Status of This Memo

How to markup these elements/attributes?

  1. the attribute iref/@primary (https://tools.ietf.org/html/rfc7749#section-2.20.2)? I can't find any xml example.

  2. the element boilerplate(https://tools.ietf.org/html/rfc7991#section-2.11). Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

    <boilerplate>
      <section anchor="status-of-memo" numbered="false" removeInRFC="false" toc="exclude" pn="section-boilerplate.1">
        <name slugifiedName="name-status-of-this-memo">Status of This Memo</name>
        <t pn="section-boilerplate.1-1">
            This is an Internet Standards Track document.
        </t>...
  3. the element displayreference (https://tools.ietf.org/html/rfc7991#section-2.19). Example (https://www.rfc-editor.org/rfc/rfc8651.xml):

    <back>
    <displayreference target="I-D.ietf-manet-dlep-da-credit-extension" to="DLEP-DIFFSERV"/>
    <displayreference target="I-D.ietf-manet-dlep-credit-flow-control" to="DLEP-CREDIT"/>
    <references pn="section-6">
      <name slugifiedName="name-references">References</name>...
            ...
        <reference anchor="I-D.ietf-manet-dlep-da-credit-extension" quoteTitle="true" target="https://tools.ietf.org/html/draft-ietf-manet-dlep-da-credit-extension-07" derivedAnchor="DLEP-DIFFSERV">
          <front>
            <title>DLEP DiffServ Aware Credit Window Extension</title>...
  4. the element referencegroup (https://tools.ietf.org/html/rfc7991#section-2.41). Example (https://www.rfc-editor.org/rfc/rfc8722.xml):

  <back>
    <references pn="section-6">
      <name slugifiedName="name-informative-references">Informative References</name>
      <referencegroup anchor="BCP9" target="https://www.rfc-editor.org/info/bcp9" derivedAnchor="BCP9">
        <reference anchor="RFC2026" target="https://www.rfc-editor.org/info/rfc2026" quoteTitle="true">
          <front>
                    ...
        </reference>
      </referencegroup>
      <reference anchor="MoU_SUPP2019" target="https://www.ietf.org/media/documents/FINAL_2019-IETF_MoU_Supplemental_Agreement_Signed_31July19.pdf" quoteTitle="true" derivedAnchor="MoU_SUPP2019">
        <front>...
  1. the attributes rfc/@scripts, rfc/@prepTime and rfc/@expiresDate (no example for last one) (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#appendix-B.3)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

    <rfc ... prepTime="2019-12-06T13:42:53" scripts="Common,Latin" ...>
  2. the attributes xref/@section and xref/@sectionFormat (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-1.4.2)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

    <xref derivedContent="1" format="counter" sectionFormat="of" target="section-1"/>
    <xref derivedContent="" format="title" sectionFormat="of" target="name-introduction">Introduction</xref>
  3. the attribute table/@align (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-2.53.1)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

        <table anchor="gen-sub-errors" align="center" pn="table-1">
          <name slugifiedName="name-general-subscription-error-">General Subscription Error Identities and Associated "error-tag" Use</name>
opoudjis commented 3 years ago

I have a few questions about mapping rules between some RFC XML elements/attributes and Metanorma AsciiRFC markup:

  1. in Metanorma AsciiRFC markup, is the attributes (key=value) order important?

Not for key=value attributes. Style attributes, such as [source,...], do need to appear first.

  1. the attribute iref/@primary (https://tools.ietf.org/html/rfc7749#section-2.20.2)? I can't find any xml example.

I haven't implemented it; will implement it in https://github.com/metanorma/metanorma-ietf/issues/125, as indexterm2:[primary:firstterm] or indexterm:[primary:firstterm, secondterm, thirdterm]

  1. the element boilerplate(https://tools.ietf.org/html/rfc7991#section-2.11). Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

That element appears to be supplied by the IETF editors; we do not populate it.

  1. the element displayreference (https://tools.ietf.org/html/rfc7991#section-2.19). Example (https://www.rfc-editor.org/rfc/rfc8651.xml):

No discrete element; instead, we would provide text for any eref referencing that element. So for <displayreference target="I-D.ietf-manet-dlep-da-credit-extension" to="DLEP-DIFFSERV"/>, any crossreference to it (so <<I-D.ietf-manet-dlep-da-credit-extension>>) would become <<I-D.ietf-manet-dlep-da-credit-extension, DLEP-DIFFSERV>>. However, that would obviously not round-trip to generate displayreference again. I'm ok with that.

  1. the element referencegroup (https://tools.ietf.org/html/rfc7991#section-2.41). Example (https://www.rfc-editor.org/rfc/rfc8722.xml):

Not directly implemented. Realise the example you've given as the following Asciibib, using the part-of relationship, although it is not going to round-trip:

[[BCP9]]
[%bibitem]
== {blank}
link: https://www.rfc-editor.org/info/bcp9
relation::
relation.type:: includes
relation.bibitem.docid.type:: IETF
relation.bibitem.docid.id:: RFC2026 
relation::
relation.type:: includes
relation.bibitem.docid.type:: IETF
relation.bibitem.docid.id:: ... 

Note that in practice, reference elements pointing to RFC references should be realised as autofetched references:

* [[[rfc2026,IETF RFC 2026]]]

As it turns out IETF BCP 9 should also be fetched (and its XML includes referencegroup: https://xml2rfc.tools.ietf.org/public/rfc/bibxml-rfcsubseries/reference.BCP.0009.xml). I am querying why that isn't working with @andrew2net: https://github.com/relaton/relaton-ietf/issues/44

  1. the attributes rfc/@scripts, rfc/@prepTime and rfc/@expiresDate (no example for last one) (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#appendix-B.3)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

IETF do not want prepTime included by authors, that gets generated by the editors, and I had to remove it from a previous version as it triggered a warning. All these attributes are populated by the "prepTool", which means that any author-supplied values are ignored. (And indeed, I would assume they trigger a warning if populated too.)

  1. the attributes xref/@section and xref/@sectionFormat (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-1.4.2)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):

They are carried across from relref, but they are redundant or ignored in Metanorma's handling of internal cross-references. Not supported, and I'm ok for them not to be. Note that they are not even documented in the RFC update, though I can work out their intent from relref.

  1. the attribute table/@align (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-2.53.1)?

That's an align attribute on the asciidoc table:

[align=center]
|===
|a |b

|c |d
|===
Intelligent2013 commented 3 years ago

After additional analyse of source RFC XMLs I've found some tags which were added in draft (https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12, https://trac.tools.ietf.org/tools/xml2rfc/trac/browser/trunk/cli/xml2rfc/data/v3.rng) and exist in real documents:

  1. /rfc/front/toc (example: https://www.rfc-editor.org/rfc/rfc8650.xml):
    <front>
    ...
    <toc>
      <section anchor="toc" numbered="false" removeInRFC="false" toc="exclude" pn="section-toc.1">
        <name slugifiedName="name-table-of-contents">Table of Contents</name>
        <ul bare="true" empty="true" indent="2" spacing="compact" pn="section-toc.1-1">
          <li pn="section-toc.1-1.1">
            <t keepWithNext="true" pn="section-toc.1-1.1.1"><xref derivedContent="1" format="counter" sectionFormat="of" target="section-1"/>.  <xref derivedContent="" format="title" sectionFormat="of" target="name-introduction">Introduction</xref></t>
          </li>

I propose to ignore it.

  1. eref/@brackets (example: https://www.rfc-editor.org/rfc/rfc8650.xml):
        <t pn="section-boilerplate.1-3">
            Information about the current status of this document, any
            errata, and how to provide feedback on it may be obtained at
            <eref target="https://www.rfc-editor.org/info/rfc8650" brackets="none"/>.
        </t>

For @brackets there are two values - none and angle. In case of brackets=angle should we enclose link in brackets or put as is?

<element name="eref">
<optional>
<attribute name="xml:base"/>
</optional>
<optional>
<attribute name="xml:lang"/>
</optional>
<optional>
<attribute name="brackets" a:defaultValue="none">
<choice>
<value>none</value>
<value>angle</value>
</choice>
</attribute>
</optional>
<attribute name="target"/>
<text/>
</element>
</define>
  1. ul/@bare (example: https://www.rfc-editor.org/rfc/rfc8650.xml):
    <ul spacing="normal" bare="false" empty="false" pn="section-9-5">
        <li pn="section-9-5.1">"uri": leaf will show where subscribed resources might be located on a publisher.  Access control must be set so that only someone with proper access permissions, i.e., the same RESTCONF <xref target="RFC8040" format="default" sectionFormat="of" derivedContent="RFC8040"/> user credentials that invoked the corresponding "establish-subscription", has the ability to access this resource.</li>
      </ul>

bare attribute description here https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12#section-3.1.25 But I didn't understand it.

  1. xref/@derivedLink (example: https://www.rfc-editor.org/rfc/rfc8650.xml)
    <xref target="RFC8040" sectionFormat="of" section="6.3" format="default" derivedLink="https://rfc-editor.org/rfc/rfc8040#section-6.3" derivedContent="RFC8040"/>

I propose to ignore it.

  1. /rfc/back/section/author (example: https://www.rfc-editor.org/rfc/rfc8651.xml):
<back>
...
<section anchor="authors-addresses" numbered="false" removeInRFC="false" toc="include" pn="section-appendix.b">
      <name slugifiedName="name-authors-addresses">Authors' Addresses</name>
      <author initials="B." surname="Cheng" fullname="Bow-Nan Cheng">
        <organization showOnFrontPage="true">MIT Lincoln Laboratory</organization>
        <address>
          <postal>
            <extaddr>Massachusetts Institute of Technology</extaddr>
            <street>244 Wood Street</street>
            <city>Lexington</city>
            <region>MA</region>
            <code>02421-6426</code>
            <country>United States of America</country>
          </postal>
          <email>bcheng@ll.mit.edu</email>
        </address>
      </author>

How to markup it in adoc?

  1. /rfcback/section/t/contact (example: https://www.rfc-editor.org/rfc/rfc8656.xml):
<rfc>
    <back>...
        <section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.a">
            <name slugifiedName="name-acknowledgements">Acknowledgements</name>
            <t pn="section-appendix.a-1">Most of the text in this note comes from the original TURN
                specification, <xref target="RFC5766" format="default" sectionFormat="of" derivedContent="RFC5766"/>. The authors would like to
                thank <contact fullname="Rohan Mahy"/>, coauthor of the original TURN specification, and everyone
                who had contributed to that document. The authors would also like to
                acknowledge that this document inherits material from <xref target="RFC6156" format="default" sectionFormat="of" derivedContent="RFC6156"/>.</t>
            <t pn="section-appendix.a-2">Thanks to <contact fullname="Justin Uberti"/>, <contact fullname="Pal       Martinsen"/>, <contact fullname="Oleg Moskalenko"/>, <contact fullname="Aijun Wang"/>, and <contact fullname="Simon Perreault"/> for
                their help on the ADDITIONAL-ADDRESS-FAMILY mechanism. The authors would
                like to thank <contact fullname="Gonzalo Salgueiro"/>, <contact fullname="Simon Perreault"/>, <contact fullname="Jonathan Lennox"/>,
                <contact fullname="Brandon Williams"/>, <contact fullname="Karl       Stahl"/>, <contact fullname="Noriyuki Torii"/>, <contact fullname="Nils       Ohlmeier"/>, <contact fullname="Dan Wing"/>, <contact fullname="Vijay       Gurbani"/>, <contact fullname="Joseph Touch"/>, <contact fullname="Justin Uberti"/>, <contact fullname="Christopher Wood"/>,
                <contact fullname="Roman Danyliw"/>, <contact fullname="Eric Vyncke"/>,
                <contact fullname="Adam Roach"/>, <contact fullname="Suresh Krishnan"/>,
                <contact fullname="Mirja Kuehlewind"/>, <contact fullname="Benjamin       Kaduk"/>, and <contact fullname="Oleg Moskalenko"/> for comments and
                review. The authors would like to thank <contact fullname="Marc Petit-Huguenin"/> for his
                contributions to the text.</t>
            <t pn="section-appendix.a-3">Special thanks to <contact fullname="Magnus Westerlund"/> for the detailed AD review.</t>
        </section>

or rfc/back/section/contact (example: https://www.rfc-editor.org/rfc/rfc8685.xml):

    <section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.b">
      <name slugifiedName="name-contributors">Contributors</name>
      <t pn="section-appendix.b-1">The following people contributed substantially to the content of this
   document and should be considered coauthors:</t>
      <contact fullname="Xian Zhang">
        <organization showOnFrontPage="true">Huawei</organization>
        <address>
          <email>zhang.xian@huawei.com</email>
        </address>
      </contact><contact fullname="Dhruv Dhody">
        <organization showOnFrontPage="true">Huawei Technologies</organization>
        <address>
          <postal>
            <street>Divyashree Techno Park, Whitefield</street>
            <city>Bangalore</city>
            <region>Karnataka</region>
            <code>560066</code>
            <country>India</country>
          </postal>
          <email>dhruv.ietf@gmail.com</email>
        </address>
      </contact></section>

How to markup it in adoc?

opoudjis commented 3 years ago
  1. /rfc/front/toc (example: https://www.rfc-editor.org/rfc/rfc8650.xml):

I propose to ignore it.

Agree.

  1. eref/@brackets (example: https://www.rfc-editor.org/rfc/rfc8650.xml):

For @brackets there are two values - none and angle. In case of brackets=angle should we enclose link in brackets or put as is?

Enclose it: &lt;<<reference>>&gt;. It will not roundtrip, but we regard the bracketing of references as presentation--layer in Metanorma anyway.

  1. ul/@bare (example: https://www.rfc-editor.org/rfc/rfc8650.xml):

bare attribute description here https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12#section-3.1.25 But I didn't understand it.

This is generating unordered lists with no bullet. We actually do support this in IETF Metanorma, as the unordered list attribute nobullet=true.

  1. xref/@derivedLink (example: https://www.rfc-editor.org/rfc/rfc8650.xml)

I propose to ignore it.

Agree, this was added for analogy with erefs, and we don't have a straightforward way to deal with it.

  1. /rfc/back/section/author (example: https://www.rfc-editor.org/rfc/rfc8651.xml):

How to markup it in adoc?

Unfortunately, as document attributes, which will be quite awkward for you. See https://www.metanorma.com/author/ietf/ref/document-attributes/#author-attributes . We do not currently support extaddr, organization/@showOnFrontPage, or name/@slugifiedName.

  1. /rfcback/section/t/contact (example: https://www.rfc-editor.org/rfc/rfc8656.xml):

Inline contact information is not currently supported, just render it as plain text. I can create a macro to support it, but I decline to unless it is documented in https://tools.ietf.org/html/draft-iab-rfc7991bis-03

or rfc/back/section/contact (example: https://www.rfc-editor.org/rfc/rfc8685.xml):

How to markup it in adoc?

I would suggest using the same parse as for /rfc/back/section/author, except that the role_{n} attribute is given as editor (which is our fallback).

Intelligent2013 commented 3 years ago

rfc2mn updated for conversion rules above, except rfc/back/section/contact (example: https://www.rfc-editor.org/rfc/rfc8685.xml). If we move it to document attributes, then text The following people contributed substantially to the content of this document and should be considered coauthors: remains non-processed:

<section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.b">
      <name slugifiedName="name-contributors">Contributors</name>
      <t pn="section-appendix.b-1">The following people contributed substantially to the content of this
   document and should be considered coauthors:</t>
...
</section>

May be process rfc/back/section/contact as simple text - i.e. concat all fields with , like this: Xian Zhang, Huawei, zhang.xian@huawei.com; Dhruv Dhody, Huawei Technologies, Divyashree Techno Park, Whitefield, Bangalore...?

opoudjis commented 3 years ago

@Intelligent2013 is rfc2mn integrated into mnconvert? Is this ticket complete?

Intelligent2013 commented 3 years ago

is rfc2mn integrated into mnconvert?

@opoudjis yes.

Is this ticket complete?

yes, except one issue. Moved to the separate ticket. Closing.