ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies
https://ietf-tools.github.io/xml2rfc/
BSD 3-Clause "New" or "Revised" License
68 stars 38 forks source link

References in referencegroups lose much of their information when rendering #1067

Closed cabo closed 8 months ago

cabo commented 9 months ago

Describe the issue

See what happened to the STD bib entries in

https://author-tools.ietf.org/iddiff?url2=draft-ietf-cbor-edn-literals-07

Code of Conduct

kesara commented 9 months ago

Regarding missing DOI information:

This is because draft-ietf-cbor-edn-literals-06 uses <reference> and draft-ietf-cbor-edn-literals-07 uses <referencegroup> for STDs. For example:

        <reference anchor="STD80">
          <front>
            <title>ASCII format for network interchange</title>
            <author fullname="V.G. Cerf" initials="V.G." surname="Cerf"/>
            <date month="October" year="1969"/>
          </front>
          <seriesInfo name="STD" value="80"/>
          <seriesInfo name="RFC" value="20"/>
          <seriesInfo name="DOI" value="10.17487/RFC0020"/>
        </reference>

vs.

        <referencegroup anchor="STD80">
          <reference anchor="RFC0020" target="https://www.rfc-editor.org/info/rfc20">
            <front>
              <title>ASCII format for network interchange</title>
              <author fullname="V.G. Cerf" initials="V.G." surname="Cerf"/>
              <date month="October" year="1969"/>
            </front>
            <seriesInfo name="STD" value="80"/>
            <seriesInfo name="RFC" value="20"/>
            <seriesInfo name="DOI" value="10.17487/RFC0020"/>
          </reference>
        </referencegroup>

xml2rfc deliberates doesn't show DOI for <referencegroup>. This was done in release v2.23.0 ^1.

I could not find any RFCs, issues or discussions related to this apart from the following change log entry:

  • Changed <reference> rendering when part of a <referencegroup> to not include the DOI.
cabo commented 9 months ago

Interesting. Not just the DOI is missing, also the link.

Another weird case:

https://www.rfc-editor.org/rfc/rfc9485.html#STD63

Same problem, but this time with a link to the STD.

(I don't think we are allowed to publish documents without giving the DOIs, so this may be an actual contract violation.)

kesara commented 9 months ago

Regarding missing links, it is missing because <referencefgroup> doesn't have a target attribute.

cabo commented 9 months ago

I know that there was not much discussion about referencegroups, but I think that a basic invariant is that a reference in a referencegroup must have the same information that reference would have stand-alone. Some of the trouble here is of course, what bib. delivers for a referencegroup, but that can be fixed separately -- I think xml2rfc must render this information.

ajeanmahoney commented 9 months ago

The RPC would like to include DOIs for the RFCs listed within subseries references; however, it requires some updates to the rfc-editor.org database to support that.

(I don't think we are allowed to publish documents without giving the DOIs, so this may be an actual contract violation.)

As for contracts, adding DOIs to a References section is best effort according to crossref.org membership terms. When a DOI is available, the RPC adds it to a reference entry.

cabo commented 9 months ago

Well, the DOI is in the bib.ietf.org entry. The problem is that, for unknown reasons, xml2rfc explicitly decides not to render it. That appears to be the gist of the bug we are experiencing.

(Yes, the referencegroup also should have a target. That indeed is a bib.ietf.org problem.)

$ curl https://bib.ietf.org/public/rfc/bibxml-rfcsubseries/reference.STD.0094.xml
<referencegroup anchor="STD94">
  <reference anchor="RFC8949" target="https://www.rfc-editor.org/info/rfc8949">
    <front>
      <title>Concise Binary Object Representation (CBOR)</title>
      <author fullname="C. Bormann" initials="C." surname="Bormann"/>
      <author fullname="P. Hoffman" initials="P." surname="Hoffman"/>
      <date month="December" year="2020"/>
      <abstract>
        <t>The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack.</t>
        <t>This document obsoletes RFC 7049, providing editorial improvements, new details, and errata fixes while keeping full compatibility with the interchange format of RFC 7049. It does not create a new version of the format.</t>
      </abstract>
    </front>
    <seriesInfo name="STD" value="94"/>
    <seriesInfo name="RFC" value="8949"/>
    <seriesInfo name="DOI" value="10.17487/RFC8949"/>
  </reference>
</referencegroup>
rjsparks commented 9 months ago

@cabo - to make sure you see the snarl you're tugging into - doing something that is meaningful with DOI and subseries isn't a dead-obvious process. For a lot of subseries documents, the constituent set of RFCs has never changed, and there's been only one RFC in that set. Using that as a model for what references should look like is dangerous. An RFC (and its DOI) points to something that is currently immutable. Subseries documents are not - what RFCs they contain can change at any time, and any constructed reference to a subseries that listed the DOI of the RFCs would (IMO) need to make it really clear that it was the content of the subseries at the time the reference was created.

It might help, while working though this to use BCP 9 as a worked example in addition to the simplest case.

jrlevine commented 9 months ago

That seems related to using the prep tool. I'd expect it to expand the subseries into RFCs so the prepped document has the specific RFCs at the time it was rendered.

cabo commented 9 months ago

That seems related to using the prep tool. I'd expect it to expand the subseries into RFCs so the prepped document has the specific RFCs at the time it was rendered.

This is currently done by bib.ietf.org, albeit missing the only part that actually is rendered about a referencegroup itself: the target attribute (being fixed in https://github.com/ietf-tools/bibxml-service/issues/388). So we are good here (as long as everyone uses bib.ietf.org and 388 is fixed).

cabo commented 9 months ago

[...]. An RFC (and its DOI) points to something that is currently immutable. Subseries documents are not - what RFCs they contain can change at any time, and any constructed reference to a subseries that listed the DOI of the RFCs would (IMO) need to make it really clear that it was the content of the subseries at the time the reference was created.

Yes. The current rendering of <referencegroup does not do that. That is another defect. <referencegroup apparently was designed without thinking too much about how it would actually be used in a document.

It might help, while working though this to use BCP 9 as a worked example in addition to the simplest case.

I'll use STD68, which is currently one RFC (5234) and needs to be expanded by another RFC (7405); I'll just assume that change will actually happen at some point.

This STD would be referenced by its default anchor, [STD68]. However, there is a need for Section references into RFC 5234, which in RFC 9485 is referenced as Section 2.3 of [RFC5234], but more reasonably should be referenced as Section 2.3 of RFC 5234 [STD68]. (We didn't do this because <referencegroup is currently onerous to use. We did do this with STD63, for which there is a strong reason to reference it as STD, and which also needs a section reference that is now "Section 10 ("Security Considerations") of RFC 3629 [STD63]" -- this construct was invented by RPC to enable publishing this document as section references into documents labeled STD do not make a lot of sense.)

This is a real-world example showing how the current situation is discouraging authors from using <referencegroup at all. If that was the objective, we are doing very well.

So an important next step would be to come up with a rendering of a <referencegroup that addresses the issues you are mentioning and makes <referencegroup more usable. I'm surprised that the only attempt at that we have right now is based on the idea that damaging the rendering of the <reference elements in the <referencegroup somehow mitigates the incomplete design of <referencegroup.

Why are the <reference elements in <referencegroup important? They are actually the documents based on which the referencing document was written. (The need for section references just uncovers this some more, but this is true without them.) You can't read a referencegroup, you can only read individual documents, and the documents that were the basis for the referencing document are essential for understanding the referencing document.

So maybe we can fix this bug and do an enhancement issue on the rendering of <referencegroup.

cabo commented 8 months ago

The RPC would like to include DOIs for the RFCs listed within subseries references; however, it requires some updates to the rfc-editor.org database to support that.

Hi Jean,

do you think this is still the case? What I can see from bibxml, everything is there.

ajeanmahoney commented 8 months ago

@cabo Hi Carsten,

We're working on adding DOIs to https://www.rfc-editor.org/in-notes/std-ref.txt and https://www.rfc-editor.org/in-notes/bcp-ref.txt, which are part of our online style guide.

cabo commented 8 months ago

FYI: https://mailarchive.ietf.org/arch/msg/auth48archive/W09XevZeYf0Lh-GdSMXLtHEWylI

cabo commented 8 months ago

DOIs

DOIs are good. Getting the links (which are in the XML as well) rendered would also help.

ajeanmahoney commented 8 months ago

@cabo Regarding the xrefs into sections of RFCs of subseries, I've added some comments to #639, which focuses on section xrefs for subseries.

ajeanmahoney commented 8 months ago

Regarding style guide updates, these can happen in parallel or after the xml2rfc fix.

ajeanmahoney commented 8 months ago

@cabo Hi Carsten,

Just to clarify what the fix should look like in order to close this issue -- does the following match your expectations?

Current output:

   [BCP14]    Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

              Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, May 2017.

              <https://www.rfc-editor.org/info/bcp14>

Requested output:


   [BCP14]    Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, 
              DOI 10.17487/RFC2119, March 1997.

              Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, 
              DOI 10.17487/RFC8174, May 2017.

              <https://www.rfc-editor.org/info/bcp14>
cabo commented 8 months ago

does the following match your expectations?

Almost. I'd also reinstate the links to the individual documents.

rjsparks commented 8 months ago

Could we see also see an example of a subseries doc that has more than one RFC in it to know that we're on the same page? I suggest writing out what the reference for STD 69 (which someone asked me how to make a bibxml query for today) would look like.

ajeanmahoney commented 8 months ago

Current output for STD 69:

   [STD69]    Hollenbeck, S., "Extensible Provisioning Protocol (EPP)",
              STD 69, RFC 5730, August 2009.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Domain Name Mapping", STD 69, RFC 5731, August 2009.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Host Mapping", STD 69, RFC 5732, August 2009.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Contact Mapping", STD 69, RFC 5733, August 2009.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Transport over TCP", STD 69, RFC 5734, August 2009.

              <https://www.rfc-editor.org/info/std69>

Updated output with DOIs and links:

   [STD69]    Hollenbeck, S., "Extensible Provisioning Protocol (EPP)",
              STD 69, RFC 5730, DOI 10.17487/RFC5730, August 2009,
              <https://www.rfc-editor.org/info/rfc5730>.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Domain Name Mapping", STD 69, RFC 5731, 
              DOI 10.17487/RFC5731, August 2009,
              <https://www.rfc-editor.org/info/rfc5731>.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Host Mapping", STD 69, RFC 5732, DOI 10.17487/RFC5732, 
              August 2009, 
              <https://www.rfc-editor.org/info/rfc5732>.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Contact Mapping", STD 69, RFC 5733, 
              DOI 10.17487/RFC5733, August 2009, 
              <https://www.rfc-editor.org/info/rfc5733>.

              Hollenbeck, S., "Extensible Provisioning Protocol (EPP)
              Transport over TCP", STD 69, RFC 5734, 
              DOI 10.17487/RFC5734, August 2009, 
              <https://www.rfc-editor.org/info/rfc5734>.

              <https://www.rfc-editor.org/info/std69>