ietf-tools / bibxml-service

Django-based Web service implementing IETF BibXML APIs
https://bib.ietf.org
BSD 3-Clause "New" or "Revised" License
16 stars 20 forks source link

Issues with missing 'role="editor"' entries in some bib. items for RFCs #238

Closed lbartholomew-rpc closed 2 years ago

lbartholomew-rpc commented 2 years ago

Have only seen this in one document so far, so there are probably more. Seen so far with RFCs 3473, 4426, 4872, and 6372.

The problem occurs when using https://bib.ietf.org/public/rfc/bibxml/ in the .xml file.

It is resolved by changing (back) to https://xml2rfc.ietf.org/public/rfc/bibxml/, running xml2rfc --clear-cache, and regenerating the output files.

Please see the following:

https://www.rfc-editor.org/v3test/rfc9270-bib-bugs-15July2022.xml https://www.rfc-editor.org/v3test/rfc9270-bib-bugs-15July2022.txt https://www.rfc-editor.org/v3test/rfc9270-bib-bugs-15July2022-rfcdiff.html

This file is correct: https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4872.xml This file shows the issues: https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4872.xml

"Lang, J P." in the listing for RFC 4872 is incorrect as well.

strogonoff commented 2 years ago

Filed an issue on missing roles (if they are provided in RFC editor’s authoritative data but get lost in conversion, this will get fixed).

I’m not sure how to interpret this note:

"Lang, J P." in the listing for RFC 4872 is incorrect as well.

Does this mean the initials attribute should contain dots after each initial? I see https://xml2rfc.tools.ietf.org/xml2rfc-doc.html#name-initials-attribute-3, but I’m not sure how to interpret the second paragraph—should the value contain dots or not…

ronaldtse commented 2 years ago

The missing of role "editor" is a bug. Now filed in https://github.com/relaton/relaton-ietf/issues/92#issuecomment-1192215551

"Lang, J P." in the listing for RFC 4872 is incorrect as well.

The authoritative information comes from the RFC Editor endpoint: https://www.rfc-editor.org/rfc-index.xml

However, that endpoint does not provide a structured name format:

For example in RFC 4872, the information provided is:

<rfc-entry>
  <doc-id>RFC4872</doc-id>
  <title>
RSVP-TE Extensions in Support of End-to-End Generalized Multi-Protocol Label Switching (GMPLS) Recovery
</title>
  <author>
    <name>J.P. Lang</name>
    <title>Editor</title>
  </author>
  <author>
    <name>Y. Rekhter</name>
    <title>Editor</title>
  </author>
  <author>
    <name>D. Papadimitriou</name>
    <title>Editor</title>
  </author>
  <date>
    <month>May</month>
    <year>2007</year>
  </date>
  <format>
    <file-format>ASCII</file-format>
    <file-format>HTML</file-format>
  </format>
  <page-count>47</page-count>
  <keywords>
    <kw>resource reservation protocol</kw>
    <kw>traffic engineering</kw>
  </keywords>
  <abstract>
    <p>
This document describes protocol-specific procedures and extensions for Generalized Multi-Protocol Label Switching (GMPLS) Resource ReSerVation Protocol - Traffic Engineering (RSVP-TE) signaling to support end-to-end Label Switched Path (LSP) recovery that denotes protection and restoration. A generic functional description of GMPLS recovery can be found in a companion document, RFC 4426. [STANDARDS-TRACK]
</p>
  </abstract>
  <draft>draft-ietf-ccamp-gmpls-recovery-e2e-signaling-04</draft>
  <updates>
    <doc-id>RFC3471</doc-id>
  </updates>
  <updated-by>
    <doc-id>RFC4873</doc-id>
    <doc-id>RFC6780</doc-id>
  </updated-by>
  <current-status>PROPOSED STANDARD</current-status>
  <publication-status>PROPOSED STANDARD</publication-status>
  <stream>IETF</stream>
  <area>rtg</area>
  <wg_acronym>ccamp</wg_acronym>
  <errata-url>
http://www.rfc-editor.org/errata_search.php?rfc=4872
</errata-url>
  <doi>10.17487/RFC4872</doi>
</rfc-entry>

The contributor names are not split into initials vs given name vs surname. We can certainly do a transformation, but we rely on the RPC to tell us what the correct behavior is here.

strogonoff commented 2 years ago

@ronaldtse I suspect this is not what the sentence meant: we (that is, Relaton gems) already do name normalization, which includes the extraction of initials. That’s what we have in source data, and that’s what we return in XML (see BibXML service xml2rfc link in ticket description).

Thus I thought it may have to do with initial formatting in XML, though I’m not totally sure. It could be that author name formatting logic in relaton-py bibxml serializer misinterpreted the part of xml2rfc spec I listed about formatting of initials. We return them without the dots.

ronaldtse commented 2 years ago

Thanks for the clarification. Maybe @lbartholomew-rpc just meant that the line with "J.P. Lang" is the line that demonstrates the problem of the missing role="editor"?

lbartholomew-rpc commented 2 years ago

Hi, Anton and Ronald.

Sorry I wasn't clearer about this! Ronald, your "We return them without the dots" note demonstrates the issue with "Lang, J P." in the listing for RFC 4872. Because entries for initials are being returned without the dots, they don't display correctly in output for multiple initials that should have dots instead of a space between them (in other words, "J P." should be "J.P.").

rfc-index.xml can be followed to find out what the outputs should look like. Another example is A.L.J. Verschuren. rfc-index.xml is correct, and so is https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8063.xml, but https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8063.xml is missing the dots because it uses spaces (ditto for two of the three coauthors).

There is some variation, and some authors have explicitly expressed their preferences. For example, Simon Pietro Romano as of a few years ago wants S P. Romano on the first page (note the space between "S" and "P.". However, we need to keep S. Romano for RFCs 1020, 1062, 1117, and 6503 (and per rfc-index.xml) but use S P. Romano for RFCs 6504, 7058, 8846, and 8847. So if the rfc-index.xml entries are pulled for each individual RFC, things should work fine. I don't know how you extract the data, so I'm guessing and hoping that it's not a hassle.

Thank you! Please let me know if I need to be clearer about any of this.

ronaldtse commented 2 years ago

@lbartholomew-rpc thank you for the detailed explanation! As you surmised, the bibliographic information for each RFC is converted individually, so we should be able to tweak the parsing to directly accept initials as they are, without doing anything clever.

Let us get back to you on this ticket!

lbartholomew-rpc commented 2 years ago

Hi, Ronald. You're most welcome, and sounds good!

ronaldtse commented 2 years ago

@lbartholomew-rpc we will be fixing the issue here: https://github.com/relaton/relaton-ietf/issues/95

lbartholomew-rpc commented 2 years ago

@ronaldtse -- Hi, Ronald. Thanks for the note re. https://github.com/relaton/relaton-ietf/issues/95!

ronaldtse commented 2 years ago

While https://github.com/relaton/relaton-ietf/issues/95 is fixed, which means the data source is fixed: https://github.com/ietf-tools/relaton-data-rfcs/blob/08dd1326979660b46c776d80c3205a069bf26fb4/data/RFC4872.yaml#L26-L35

Yet the output is still unfixed: https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4872.xml

<author fullname="J.P. Lang" initials="J P" role="editor" surname="Lang"/>

This means either it is a relaton-py issue or bibxml-service issue.

Ping @strogonoff @stefanomunarini .

rjsparks commented 2 years ago

Is this issue still needed? The original "role" part has a fixing commit in relaton - have we confirmed it fixes the final output here, modulo the initials? The initials are being traced at #264, and the remainder of this (or that) should probably be treated as a duplicate?

ronaldtse commented 2 years ago

@rjsparks this is exactly why I'm keeping this issue open, because this is not yet fixed. Sorry about this. I'm hoping to close this once the necessary fixes are verified.

ronaldtse commented 2 years ago

We have come up with a solution. The work is being tracked at:

ronaldtse commented 2 years ago

The fix has been merged in #282 thanks to @strogonoff . Closing.

https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4872.xml

<reference anchor="RFC4872" target="https://www.rfc-editor.org/info/rfc4872">
<front>
<title>
RSVP-TE Extensions in Support of End-to-End Generalized Multi-Protocol Label Switching (GMPLS) Recovery
</title>
<author fullname="J.P. Lang" initials="J.P." role="editor" surname="Lang"/>
<author fullname="Y. Rekhter" initials="Y." role="editor" surname="Rekhter"/>
<author fullname="D. Papadimitriou" initials="D." role="editor" surname="Papadimitriou"/>
<date month="May" year="2007"/>
<abstract>
<t>
This document describes protocol-specific procedures and extensions for Generalized Multi-Protocol Label Switching (GMPLS) Resource ReSerVation Protocol - Traffic Engineering (RSVP-TE) signaling to support end-to-end Label Switched Path (LSP) recovery that denotes protection and restoration. A generic functional description of GMPLS recovery can be found in a companion document, RFC 4426. [STANDARDS-TRACK]
</t>
</abstract>
</front>
<seriesInfo name="RFC" value="4872"/>
<seriesInfo name="DOI" value="10.17487/RFC4872"/>
</reference>