relaton / relaton-ietf

RFCBib: retrieve RFC Standards for bibliographic use using the BibliographicItem model
BSD 2-Clause "Simplified" License
2 stars 0 forks source link

(URGENT) Parsing of initials in RFC data should not change dots or spacing #95

Closed ronaldtse closed 2 years ago

ronaldtse commented 2 years ago

As described in: https://github.com/ietf-ribose/bibxml-service/issues/238#issuecomment-1192871487

From @lbartholomew-rpc (who represents the RFC Production Center):

Sorry I wasn't clearer about this! Ronald, your "We return them without the dots" note demonstrates the issue with "Lang, J P." in the listing for RFC 4872. Because entries for initials are being returned without the dots, they don't display correctly in output for multiple initials that should have dots instead of a space between them (in other words, "J P." should be "J.P.").

rfc-index.xml can be followed to find out what the outputs should look like. Another example is A.L.J. Verschuren. rfc-index.xml is correct, and so is https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8063.xml, but https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8063.xml is missing the dots because it uses spaces (ditto for two of the three coauthors).

There is some variation, and some authors have explicitly expressed their preferences. For example, Simon Pietro Romano as of a few years ago wants S P. Romano on the first page (note the space between "S" and "P.". However, we need to keep S. Romano for RFCs 1020, 1062, 1117, and 6503 (and per rfc-index.xml) but use S P. Romano for RFCs 6504, 7058, 8846, and 8847. So if the rfc-index.xml entries are pulled for each individual RFC, things should work fine. I don't know how you extract the data, so I'm guessing and hoping that it's not a hassle.

Specifically, https://github.com/ietf-tools/relaton-data-rfcs/blob/4e97527a6364853c3208d5ef10be62393cc8f969/data/RFC4872.yaml#L26-L50

Originated from rfc-index.xml:

        <author>
            <name>J.P. Lang</name>
            <title>Editor</title>
        </author>
        <author>
            <name>Y. Rekhter</name>
            <title>Editor</title>
        </author>
        <author>
            <name>D. Papadimitriou</name>
            <title>Editor</title>
        </author>

In the initials, we need to keep the original formatting since it is a sensitive topic as described by @lbartholomew-rpc.

We must add specs to test this for correctness and link to the IETF RPC requirement.

andrew2net commented 2 years ago

@ronaldtse does it need to keep initials with periods if they have periods in source?

ronaldtse commented 2 years ago

@andrew2net yes. We need to keep initials with periods if they exist.