ietf-tools / bibxml-service

Django-based Web service implementing IETF BibXML APIs
https://bib.ietf.org
BSD 3-Clause "New" or "Revised" License
16 stars 19 forks source link

IANA references should not contain dates #312

Closed rjsparks closed 1 year ago

rjsparks commented 1 year ago

See https://mailarchive.ietf.org/arch/msg/tools-discuss/2OXa1uII9glZewc_--fokyJejoc Please remove the dates from the references to IANA registries.

ronaldtse commented 1 year ago

The quoted mail is from @cabo:

Next installment of bib.ietf.org Brownian motion:

$ curl -L bib.ietf.org/public/rfc/bibxml8/reference.IANA.mud.xml
<reference anchor="IANA_mud" target="http://www.iana.org/assignments/mud">
  <front>
    <title>Manufacturer Usage Description (MUD)</title>
    <author>
      <organization abbrev="IANA">Internet Assigned Numbers Authority</organization>
    </author>
    <date day="27" month="June" year="2018"/>
  </front>
</reference>

Why?

What is the significance of that date?

mud.txt says:

Created 2018-06-27

Last Updated 2019-05-15

So it seems this shows the creation date.

I’ll stop here, I’m at a loss for words.

ronaldtse commented 1 year ago

As mentioned by @cabo, the authoritative source lists out two dates:

Screen Shot 2022-10-26 at 1 47 17 PM

According to RFC 7991:

2.17. <date>

Provides information about the publication date. This element is used for two cases: the boilerplate of the document being produced, and inside bibliographic references that use the element.

Boilerplate for Internet-Drafts and RFCs: This element defines the date of publication for the current document (Internet-Draft or RFC). When producing Internet-Drafts, the prep tool uses this date to compute the expiration date (see [IDGUIDE]). When one or more of "year", "month", or "day" are left out, the prep tool will attempt to use the current system date if the attributes that are present are consistent with that date.

In dates in elements, the month must be a number or a month in English. The prep tool will silently change text month names to numbers. Similarly, the year must be a four-digit number.

When the prep tool is used to create Internet-Drafts, it will reject a submitted Internet-Draft that has a element in the boilerplate for itself that is anything other than today. That is, the tool will not allow a submitter to specify a date other than the day of submission. To avoid this problem, authors might simply not include a element in the boilerplate.

Bibliographic references: In dates in elements, the date information can have prose text for the month or year. For example, vague dates (year="ca. 2000"), date ranges (year="2012-2013"), non-specific months (month="Second quarter"), and so on are allowed.

This element appears as a child element of (Section 2.26).

Content model: this element does not have any contents.

The <date> element is supposed to be the "publication date", i.e. the "creation date of the registry".

@rjsparks is the intent to not track publication dates for IANA registries? Semantically, that does not seem correct.

@cabo could I enquire the reason for citing the registry -- is it to cite the registry itself, or to cite an entry of the registry?

Thanks!

ajeanmahoney commented 1 year ago

@ronaldtse The construction of the references to IANA registries is specified in the Web Portion of the Style Guide, and the guidance was developed in coordination with IANA: https://www.rfc-editor.org/styleguide/part2/#ref_iana_reg

ronaldtse commented 1 year ago

@ajeanmahoney thanks for the clarification. In this case we will not provide dates for IANA references in BibXML output.

The relevant tasks are:

cabo commented 1 year ago

Whatever the documentation, the guiding principle for this migration should have been not to break existing documents.

strogonoff commented 1 year ago

@ronaldtse me and @stefanomunarini took a note about dates, but regarding the larger issue

Whatever the documentation, the guiding principle for this migration should have been not to break existing documents.

To go beyond RFCXML schema compliance and avoid regressions in legacy XML consumers we’ve been doing diffing to spot inconsistencies across the previously existing and new XML (and are working on turning the script we’ve been using into a more user-friendly tool), however unfortunately this approach does not work well for IANA since I don’t think we have received preexisting XML from the xml2rfc tools service snapshot (i.e., it’s not in https://github.com/ietf-tools/bibxml-data-archive). The diffing tool we use relies on bibxml-data-archive having the preexisting XML to diff against.

Perhaps we could obtain preexisting IANA XML from somewhere?

cabo commented 1 year ago

A good repository of existing RFCXMLv3 is the set of published RFCs. These are available in final form ("prepped"), but also in the last processable form ("prerelease"). I haven't checked, but there should be some IANA references in these.

strogonoff commented 1 year ago

What we have is a tool that iterates over /public/rfc/<subdir>/*.xml and shows a before/after diff. Unfortunately, for bibxml8 we don’t have the “before” XML, because I think bibxml8 contents were dynamically generated based on IANA endpoint, so we can’t diff. I’m not sure where in RFC data we could find the exact XML required for diffing. I can’t find IANA MUD registry XML as part of RFC 9238 XML, for example.

I guess we could use parts of legacy xml2rfc tools code to create a “surrogate” bibxml8 directory that could be used for diffing and catching these issues.

On a more general note, it’s not always straightforward to distinguish between cases where we are legitimately improving on erroneous or in this case seemingly incomplete preexisting XML (both were noted to happen across various xml2rfc datasets) and cases where we in fact introduce “improvements” that comply with the schema but break another guideline or some consumer’s expectation. I believe one of the goals was to expand on data where possible, but it means there are cases like this.

I’m trying to see how we can catch this earlier.

strogonoff commented 1 year ago

@ronaldtse The construction of the references to IANA registries is specified in the Web Portion of the Style Guide, and the guidance was developed in coordination with IANA: https://www.rfc-editor.org/styleguide/part2/#ref_iana_reg

This may be a naive question, but e.g. https://www.rfc-editor.org/part2/#ref_iana_reg appears to describe how IANA registries should be referenced in text. How should we interpret this as affecting service’s XML output, in particular with regards to the current issue about <date> element? Judging by Ronald’s subsequent comment I assume it’s an obvious matter so maybe I’m missing something.

From what I can understand https://www.rfc-editor.org/rfc/rfc7322.txt and https://www.rfc-editor.org/styleguide/part2/ (which were mentioned before) concern themselves with RFC text/HTML formatting, as opposed to RFCXML output. Thus, they have not been carefully taken into account when working on this service’s xml2rfc rendering. Perhaps they should be, the question is how…

(cc @ronaldtse)

cabo commented 1 year ago

Indeed, there are no xi:includes left for IANA in /prerelease. You might still search for <reference anchor="IANA to get a number of IANA references. I find some 158 matches.

rjsparks commented 1 year ago

I believe one of the goals was to expand on data where possible

This has been the root of a lot of the difficulty with this transition. For us, adding data to the reference just because the grammar would let you was not intended to be part of this project. The explicit goal of the project was to replace the existing service with better infrastructure. Making the content of what was served "better" crept in along the way. Some of it was unavoidable as what was served before was actually damaged, but many of the changes really should have gone through more discussion than they did. Remember, though, that this has all been happening as we've been working as a community to figure out where to best have some of the discussion that would have been nicer to have.

We, of course, do want things to improve, but we shouldn't be introducing breaking changes that aren't absolutely necessary.

ajeanmahoney commented 1 year ago

@strogonoff RFC 7322 and the Web Portion of the Style Guide provide guidance on how the references should appear in text rather than how the XML should be constructed because authors do not have to write in XML. Authors can write their drafts in a variety of formats (XML, markdown, and plain text). Tools are provided to convert these various formats to RFCXML.

rjsparks commented 1 year ago

@ronaldtse @strogonoff : What's holding back correcting this issue? (See https://mailarchive.ietf.org/arch/msg/tools-discuss/Lmv3SN6UqQywtMdlk-wqp3m8_io)

ronaldtse commented 1 year ago

@rjsparks we are just waiting to merge https://github.com/relaton/relaton-py/issues/53 , then we can merge the fix into bibxml-service.

kesara commented 1 year ago

322 has been deployed & iana dataset has been reindexed.

stefanomunarini commented 1 year ago

322 has been deployed & iana dataset has been reindexed.

Thank you @kesara