Closed ribose-jeffreylau closed 3 years ago
Comparable existing tools:
rfc2md would be an appropriate starting point for this project. It is, in retrospect, somewhat naive about RFC XML v2 structure, and it is in XSLT, which means only a mother could love it. But the existing Word-to-Asciidoctor and Word-to-Markdown tools are also imperfect, and that does not get in the way of them being useful.
This comment is in no way me volunteering to work on this: I have more than enough to do on the Metanorma stack. But customising rfc2md to align with Metanorma-IETF markup is a task someone else can pick up.
This would now need to become v3 XML.
@Intelligent2013 do you have time for this? Thanks!
@ronaldtse Yes, I have a time.
I understand this task so:
Source RFC format:
Destination Metanorma Asciidoc IETF format: https://datatracker.ietf.org/doc/draft-ribose-asciirfc/?include_text=1
Roadmap:
Starting point for converter: https://github.com/riboseinc/rfc2md
Examples for testing: Source RFC XML Examples: https://github.com/metanorma/mn-samples-ietf/tree/gh-pages/documents Expected result: https://github.com/metanorma/mn-samples-ietf/tree/master/sources
Any suggestions or comments?
Oh wooooooow...
@Intelligent2013 https://tools.ietf.org/html/rfc7991 is not the only reference you should be using. RFC XML v3 has been laughably unstable over the past couple of years, although you might argue it has started to settle down now. (It was developed a couple of years before anyone implemented it, and the implementation brought about a couple more years of tinkering.) But you will need to also consult:
Other than the specs the key point is to find a stash of actual RFC XML v2/v3 documents. I think datatracker.ietf.org may have something like this?
Actual RFC XML v3 documents can be founded via https://tools.ietf.org/rfc/. But they are available from RFC8650 (November 2019). The document on https://datatracker.ietf.org/doc/rfc8650/ has link to xml on site https://www.rfc-editor.org/rfc/rfc8650.xml. In any case all V3 XMLs founded here: https://www.rfc-editor.org/rfc/ (also from 8650), and ~240 xmls there. For V2 I didn't find actual xmls.
Let's use the rfc2mn repository to publish this software. Thanks! https://github.com/metanorma/rfc2mn
@ronaldtse How rfc2mn should convert RFC XML v2:
How rfc2mn should convert RFC XML v2:
- RFC XML v2 -> Metanorma AsciiRFC v2, or
- RFC XML v2 -> Metanorma AsciiRFC v3 ?
Both versions should be converted to Metanorma AsciiRFC v3, so that people can create new versions of the published RFCs.
I have a few questions about mapping rules between some RFC XML elements/attributes and Metanorma AsciiRFC markup:
[numbered=false,removeInRFC=false,toc=exclude]
== Status of This Memo
instead of removeInRFC,toc,numbered ?:
[removeInRFC=false,toc=exclude,numbered=false]
== Status of This Memo
How to markup these elements/attributes?
the attribute iref/@primary
(https://tools.ietf.org/html/rfc7749#section-2.20.2)? I can't find any xml example.
the element boilerplate
(https://tools.ietf.org/html/rfc7991#section-2.11). Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
<boilerplate>
<section anchor="status-of-memo" numbered="false" removeInRFC="false" toc="exclude" pn="section-boilerplate.1">
<name slugifiedName="name-status-of-this-memo">Status of This Memo</name>
<t pn="section-boilerplate.1-1">
This is an Internet Standards Track document.
</t>...
the element displayreference
(https://tools.ietf.org/html/rfc7991#section-2.19). Example (https://www.rfc-editor.org/rfc/rfc8651.xml):
<back>
<displayreference target="I-D.ietf-manet-dlep-da-credit-extension" to="DLEP-DIFFSERV"/>
<displayreference target="I-D.ietf-manet-dlep-credit-flow-control" to="DLEP-CREDIT"/>
<references pn="section-6">
<name slugifiedName="name-references">References</name>...
...
<reference anchor="I-D.ietf-manet-dlep-da-credit-extension" quoteTitle="true" target="https://tools.ietf.org/html/draft-ietf-manet-dlep-da-credit-extension-07" derivedAnchor="DLEP-DIFFSERV">
<front>
<title>DLEP DiffServ Aware Credit Window Extension</title>...
the element referencegroup
(https://tools.ietf.org/html/rfc7991#section-2.41). Example (https://www.rfc-editor.org/rfc/rfc8722.xml):
<back>
<references pn="section-6">
<name slugifiedName="name-informative-references">Informative References</name>
<referencegroup anchor="BCP9" target="https://www.rfc-editor.org/info/bcp9" derivedAnchor="BCP9">
<reference anchor="RFC2026" target="https://www.rfc-editor.org/info/rfc2026" quoteTitle="true">
<front>
...
</reference>
</referencegroup>
<reference anchor="MoU_SUPP2019" target="https://www.ietf.org/media/documents/FINAL_2019-IETF_MoU_Supplemental_Agreement_Signed_31July19.pdf" quoteTitle="true" derivedAnchor="MoU_SUPP2019">
<front>...
the attributes rfc/@scripts, rfc/@prepTime and rfc/@expiresDate (no example for last one) (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#appendix-B.3)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
<rfc ... prepTime="2019-12-06T13:42:53" scripts="Common,Latin" ...>
the attributes xref/@section and xref/@sectionFormat (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-1.4.2)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
<xref derivedContent="1" format="counter" sectionFormat="of" target="section-1"/>
<xref derivedContent="" format="title" sectionFormat="of" target="name-introduction">Introduction</xref>
the attribute table/@align (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-2.53.1)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
<table anchor="gen-sub-errors" align="center" pn="table-1">
<name slugifiedName="name-general-subscription-error-">General Subscription Error Identities and Associated "error-tag" Use</name>
I have a few questions about mapping rules between some RFC XML elements/attributes and Metanorma AsciiRFC markup:
- in Metanorma AsciiRFC markup, is the attributes (key=value) order important?
Not for key=value attributes. Style attributes, such as [source,...]
, do need to appear first.
- the attribute
iref/@primary
(https://tools.ietf.org/html/rfc7749#section-2.20.2)? I can't find any xml example.
I haven't implemented it; will implement it in https://github.com/metanorma/metanorma-ietf/issues/125, as indexterm2:[primary:firstterm]
or indexterm:[primary:firstterm, secondterm, thirdterm]
- the element
boilerplate
(https://tools.ietf.org/html/rfc7991#section-2.11). Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
That element appears to be supplied by the IETF editors; we do not populate it.
- the element
displayreference
(https://tools.ietf.org/html/rfc7991#section-2.19). Example (https://www.rfc-editor.org/rfc/rfc8651.xml):
No discrete element; instead, we would provide text for any eref referencing that element. So for <displayreference target="I-D.ietf-manet-dlep-da-credit-extension" to="DLEP-DIFFSERV"/>
, any crossreference to it (so <<I-D.ietf-manet-dlep-da-credit-extension>>
) would become <<I-D.ietf-manet-dlep-da-credit-extension, DLEP-DIFFSERV>>
. However, that would obviously not round-trip to generate displayreference
again. I'm ok with that.
- the element
referencegroup
(https://tools.ietf.org/html/rfc7991#section-2.41). Example (https://www.rfc-editor.org/rfc/rfc8722.xml):
Not directly implemented. Realise the example you've given as the following Asciibib, using the part-of relationship, although it is not going to round-trip:
[[BCP9]]
[%bibitem]
== {blank}
link: https://www.rfc-editor.org/info/bcp9
relation::
relation.type:: includes
relation.bibitem.docid.type:: IETF
relation.bibitem.docid.id:: RFC2026
relation::
relation.type:: includes
relation.bibitem.docid.type:: IETF
relation.bibitem.docid.id:: ...
Note that in practice, reference elements pointing to RFC references should be realised as autofetched references:
* [[[rfc2026,IETF RFC 2026]]]
As it turns out IETF BCP 9 should also be fetched (and its XML includes referencegroup: https://xml2rfc.tools.ietf.org/public/rfc/bibxml-rfcsubseries/reference.BCP.0009.xml). I am querying why that isn't working with @andrew2net: https://github.com/relaton/relaton-ietf/issues/44
- the attributes rfc/@scripts, rfc/@prepTime and rfc/@expiresDate (no example for last one) (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#appendix-B.3)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
IETF do not want prepTime included by authors, that gets generated by the editors, and I had to remove it from a previous version as it triggered a warning. All these attributes are populated by the "prepTool", which means that any author-supplied values are ignored. (And indeed, I would assume they trigger a warning if populated too.)
- the attributes xref/@section and xref/@sectionFormat (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-1.4.2)? Example (https://www.rfc-editor.org/rfc/rfc8650.xml):
They are carried across from relref, but they are redundant or ignored in Metanorma's handling of internal cross-references. Not supported, and I'm ok for them not to be. Note that they are not even documented in the RFC update, though I can work out their intent from relref.
- the attribute table/@align (https://tools.ietf.org/html/draft-iab-rfc7991bis-01#section-2.53.1)?
That's an align attribute on the asciidoc table:
[align=center]
|===
|a |b
|c |d
|===
After additional analyse of source RFC XMLs I've found some tags which were added in draft (https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12, https://trac.tools.ietf.org/tools/xml2rfc/trac/browser/trunk/cli/xml2rfc/data/v3.rng) and exist in real documents:
/rfc/front/toc
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):
<front>
...
<toc>
<section anchor="toc" numbered="false" removeInRFC="false" toc="exclude" pn="section-toc.1">
<name slugifiedName="name-table-of-contents">Table of Contents</name>
<ul bare="true" empty="true" indent="2" spacing="compact" pn="section-toc.1-1">
<li pn="section-toc.1-1.1">
<t keepWithNext="true" pn="section-toc.1-1.1.1"><xref derivedContent="1" format="counter" sectionFormat="of" target="section-1"/>. <xref derivedContent="" format="title" sectionFormat="of" target="name-introduction">Introduction</xref></t>
</li>
I propose to ignore it.
eref/@brackets
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):
<t pn="section-boilerplate.1-3">
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
<eref target="https://www.rfc-editor.org/info/rfc8650" brackets="none"/>.
</t>
For @brackets
there are two values - none and angle. In case of brackets=angle
should we enclose link in brackets or put as is?
<element name="eref">
<optional>
<attribute name="xml:base"/>
</optional>
<optional>
<attribute name="xml:lang"/>
</optional>
<optional>
<attribute name="brackets" a:defaultValue="none">
<choice>
<value>none</value>
<value>angle</value>
</choice>
</attribute>
</optional>
<attribute name="target"/>
<text/>
</element>
</define>
ul/@bare
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):
<ul spacing="normal" bare="false" empty="false" pn="section-9-5">
<li pn="section-9-5.1">"uri": leaf will show where subscribed resources might be located on a publisher. Access control must be set so that only someone with proper access permissions, i.e., the same RESTCONF <xref target="RFC8040" format="default" sectionFormat="of" derivedContent="RFC8040"/> user credentials that invoked the corresponding "establish-subscription", has the ability to access this resource.</li>
</ul>
bare
attribute description here https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12#section-3.1.25
But I didn't understand it.
xref/@derivedLink
(example: https://www.rfc-editor.org/rfc/rfc8650.xml)
<xref target="RFC8040" sectionFormat="of" section="6.3" format="default" derivedLink="https://rfc-editor.org/rfc/rfc8040#section-6.3" derivedContent="RFC8040"/>
I propose to ignore it.
/rfc/back/section/author
(example: https://www.rfc-editor.org/rfc/rfc8651.xml):<back>
...
<section anchor="authors-addresses" numbered="false" removeInRFC="false" toc="include" pn="section-appendix.b">
<name slugifiedName="name-authors-addresses">Authors' Addresses</name>
<author initials="B." surname="Cheng" fullname="Bow-Nan Cheng">
<organization showOnFrontPage="true">MIT Lincoln Laboratory</organization>
<address>
<postal>
<extaddr>Massachusetts Institute of Technology</extaddr>
<street>244 Wood Street</street>
<city>Lexington</city>
<region>MA</region>
<code>02421-6426</code>
<country>United States of America</country>
</postal>
<email>bcheng@ll.mit.edu</email>
</address>
</author>
How to markup it in adoc?
/rfcback/section/t/contact
(example: https://www.rfc-editor.org/rfc/rfc8656.xml):<rfc>
<back>...
<section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.a">
<name slugifiedName="name-acknowledgements">Acknowledgements</name>
<t pn="section-appendix.a-1">Most of the text in this note comes from the original TURN
specification, <xref target="RFC5766" format="default" sectionFormat="of" derivedContent="RFC5766"/>. The authors would like to
thank <contact fullname="Rohan Mahy"/>, coauthor of the original TURN specification, and everyone
who had contributed to that document. The authors would also like to
acknowledge that this document inherits material from <xref target="RFC6156" format="default" sectionFormat="of" derivedContent="RFC6156"/>.</t>
<t pn="section-appendix.a-2">Thanks to <contact fullname="Justin Uberti"/>, <contact fullname="Pal Martinsen"/>, <contact fullname="Oleg Moskalenko"/>, <contact fullname="Aijun Wang"/>, and <contact fullname="Simon Perreault"/> for
their help on the ADDITIONAL-ADDRESS-FAMILY mechanism. The authors would
like to thank <contact fullname="Gonzalo Salgueiro"/>, <contact fullname="Simon Perreault"/>, <contact fullname="Jonathan Lennox"/>,
<contact fullname="Brandon Williams"/>, <contact fullname="Karl Stahl"/>, <contact fullname="Noriyuki Torii"/>, <contact fullname="Nils Ohlmeier"/>, <contact fullname="Dan Wing"/>, <contact fullname="Vijay Gurbani"/>, <contact fullname="Joseph Touch"/>, <contact fullname="Justin Uberti"/>, <contact fullname="Christopher Wood"/>,
<contact fullname="Roman Danyliw"/>, <contact fullname="Eric Vyncke"/>,
<contact fullname="Adam Roach"/>, <contact fullname="Suresh Krishnan"/>,
<contact fullname="Mirja Kuehlewind"/>, <contact fullname="Benjamin Kaduk"/>, and <contact fullname="Oleg Moskalenko"/> for comments and
review. The authors would like to thank <contact fullname="Marc Petit-Huguenin"/> for his
contributions to the text.</t>
<t pn="section-appendix.a-3">Special thanks to <contact fullname="Magnus Westerlund"/> for the detailed AD review.</t>
</section>
or rfc/back/section/contact
(example: https://www.rfc-editor.org/rfc/rfc8685.xml):
<section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.b">
<name slugifiedName="name-contributors">Contributors</name>
<t pn="section-appendix.b-1">The following people contributed substantially to the content of this
document and should be considered coauthors:</t>
<contact fullname="Xian Zhang">
<organization showOnFrontPage="true">Huawei</organization>
<address>
<email>zhang.xian@huawei.com</email>
</address>
</contact><contact fullname="Dhruv Dhody">
<organization showOnFrontPage="true">Huawei Technologies</organization>
<address>
<postal>
<street>Divyashree Techno Park, Whitefield</street>
<city>Bangalore</city>
<region>Karnataka</region>
<code>560066</code>
<country>India</country>
</postal>
<email>dhruv.ietf@gmail.com</email>
</address>
</contact></section>
How to markup it in adoc?
/rfc/front/toc
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):I propose to ignore it.
Agree.
eref/@brackets
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):For
@brackets
there are two values - none and angle. In case ofbrackets=angle
should we enclose link in brackets or put as is?
Enclose it: <<<reference>>>
. It will not roundtrip, but we regard the bracketing of references as presentation--layer in Metanorma anyway.
ul/@bare
(example: https://www.rfc-editor.org/rfc/rfc8650.xml):
bare
attribute description here https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-12#section-3.1.25 But I didn't understand it.
This is generating unordered lists with no bullet. We actually do support this in IETF Metanorma, as the unordered list attribute nobullet=true
.
xref/@derivedLink
(example: https://www.rfc-editor.org/rfc/rfc8650.xml)I propose to ignore it.
Agree, this was added for analogy with erefs, and we don't have a straightforward way to deal with it.
/rfc/back/section/author
(example: https://www.rfc-editor.org/rfc/rfc8651.xml):How to markup it in adoc?
Unfortunately, as document attributes, which will be quite awkward for you. See https://www.metanorma.com/author/ietf/ref/document-attributes/#author-attributes . We do not currently support extaddr
, organization/@showOnFrontPage
, or name/@slugifiedName
.
/rfcback/section/t/contact
(example: https://www.rfc-editor.org/rfc/rfc8656.xml):
Inline contact information is not currently supported, just render it as plain text. I can create a macro to support it, but I decline to unless it is documented in https://tools.ietf.org/html/draft-iab-rfc7991bis-03
or
rfc/back/section/contact
(example: https://www.rfc-editor.org/rfc/rfc8685.xml):How to markup it in adoc?
I would suggest using the same parse as for /rfc/back/section/author
, except that the role_{n}
attribute is given as editor
(which is our fallback).
rfc2mn updated for conversion rules above, except rfc/back/section/contact
(example: https://www.rfc-editor.org/rfc/rfc8685.xml). If we move it to document attributes, then text The following people contributed substantially to the content of this document and should be considered coauthors:
remains non-processed:
<section numbered="false" toc="include" removeInRFC="false" pn="section-appendix.b">
<name slugifiedName="name-contributors">Contributors</name>
<t pn="section-appendix.b-1">The following people contributed substantially to the content of this
document and should be considered coauthors:</t>
...
</section>
May be process rfc/back/section/contact
as simple text - i.e. concat all fields with ,
like this:
Xian Zhang, Huawei, zhang.xian@huawei.com; Dhruv Dhody, Huawei Technologies, Divyashree Techno Park, Whitefield, Bangalore...
?
@Intelligent2013 is rfc2mn integrated into mnconvert? Is this ticket complete?
is rfc2mn integrated into mnconvert?
@opoudjis yes.
Is this ticket complete?
yes, except one issue. Moved to the separate ticket. Closing.
In CalConnect, there's interest in a tool that supports converting from RFC XML v2 to Metanorma Asciidoc. Having such a tool would help transition existing documents already in RFC XML to Metanorma.