Incorrect or missing info. in initials, surname, fullname for some bibxml3 files

lbartholomew-rpc commented 1 year ago

Describe the issue

It looks like all of the issues below except the "I." for IJ. Wijnands (a long-standing issue in I-Ds) were introduced between Sept. 15 and Oct. 10, 2022. I found these when doing pre-publication steps on RFC-to-be 9262 today.

https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-bier-multicast-http-response.xml https://datatracker.ietf.org/doc/bibxml3/reference.I-D.eckert-bier-te-frr.xml Toerless Eckert's initial is now "T. T." in these two files; should be "T."

https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-bier-te-yang.xml "hu" should be "Hu" per the draft document, and author initials="" surname="chenhuanan" fullname="chenhuanan" should be author initials="H." surname="Chen" fullname="Huanan Chen"

https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-bier-non-mpls-bift-encoding.xml "I." should be "IJ. (for IJ. Wijnands), and "M. P." should be "M." for Mankamana Prasad Mishra

You can search for ", draft" in the .txt files to compare v2 with v3.

https://www.rfc-editor.org/v3test/rfc9262v2.txt (from Sept. 15) https://www.rfc-editor.org/v3test/rfc9262v2.xml (from Sept. 15) https://www.rfc-editor.org/v3test/rfc9262v3.txt (from Oct. 10) https://www.rfc-editor.org/v3test/rfc9262v3.xml (from Oct. 10)

Or here's a diff file showing the issues: https://www.rfc-editor.org/v3test/rfc9262v2-v3-rfcdiff.html

Code of Conduct

[X] I agree to follow the IETF's Code of Conduct

kesara commented 1 year ago

I think these issues should be raised on datatracker (maybe xml2rfc). See bibxml citations doesn't have this issue:

I'll transfer this ticket to datatracker.

lbartholomew-rpc commented 1 year ago

Hi, @kesara. Thanks for transferring this ticket and pointing me in the correct direction. Will be sure to check both bib.ietf and datatracker files going foward.

rjsparks commented 1 year ago

Short answer: send a message to support@ asking the secretariat to work with the authors to set the their names in the datatracker to what they want to appear in the bibxml.

Longer answer: This interacts directly with https://github.com/ietf-tools/datatracker/issues/4384 and the solution there is causing the pain here.

At the moment the datatracker builds the names in the bibxml from the Person object associated with each DocumentAuthor object associated with the draft.

The relevant places in the code are: https://github.com/ietf-tools/datatracker/blob/main/ietf/templates/doc/bibxml.xml#L5 https://github.com/ietf-tools/datatracker/blob/main/ietf/person/models.py#L104 and https://github.com/ietf-tools/datatracker/blob/main/ietf/person/name.py#L70-L79 for the initials https://github.com/ietf-tools/datatracker/blob/main/ietf/person/models.py#L106 and https://github.com/ietf-tools/datatracker/blob/main/ietf/person/models.py#L106 for last_name https://github.com/ietf-tools/datatracker/blob/main/ietf/person/models.py#L44 for name

Decoding that a bit - what's happening is

last name is pulled out of the Person's provided Unicode name
initials are pulled out of the Person's Ascii name if provided, otherwise the Unicode name
the name is taken as the Unicode name provided with no modification.

In Toerless' case, his unicode name is currently 'Toerless Eckert' but his ascii name is 'Toerless T. Eckert'.

In Huanen Chen's case, both names are set to chenhuanan.

The rest are similar.

Now we could revert what we did in response to #4384 and these particular cases would look a little better, but many others would break. cc @cabo

Rock meets hard place.

Again, I recommend for now, that you and the authors and the secretariat work on the names as represented in the datatracker.

In the long run - we seriously need to stop trying to tear names apart into things like last name and initials, and all of this should just use the Person's provided unicode name (and, likely for these drafts, the captured unicode name (when we eventually have that) from each draft submission).

ajeanmahoney commented 1 year ago

For bibxml data, the author name should match what is in the header of the document. If an author's name was "J. Doe" in the header of a document they wrote a few years ago, then the author name should remain "J. Doe" in the bibxml data for that document even if J. Doe updates their datatracker profile to "Joe Pizza-Doe". The metadata that datatracker displays for the document can display "Joe Pizza-Doe" and point to Pizza-Doe's datatracker profile page.

ajeanmahoney commented 1 year ago

I'm seeing a similar issue in bibtex entries. It appears that info from an author's datatracker profile is being used rather than the info in the document header. For instance, the bibtex file for RFC 7376 shows "Reddy.K" rather than "Reddy", which was the name used in the RFC.

rjsparks commented 1 year ago

ORDER is also different The order in DocumentAuthor is not always the same as the order in Submission.authors

>>> DocumentAuthor.objects.filter(document__name='draft-narten-iana-considerations-rfc2434bis').values_list('person__name','order')
<QuerySet [('Harald T. Alvestrand', -1), ('Dr. Thomas Narten', 0)]>

>>> Submission.objects.get(name='draft-narten-iana-considerations-rfc2434bis',rev='09',state='posted').authors
[{'email': 'narten@us.ibm.com', 'name': 'Thomas Narten'}, {'email': 'Harald@Alvestrand.no', 'name': 'Harald Alvestrand'}]

(where ordering in the json list in the second block is significant.

So the changes in ordering that we're seeing are due exactly to what we did about #4384

rjsparks commented 1 year ago

One thing we could do here is regenerate DocumentAuthor records by parsing drafts for all older versions of drafts that eventually became RFCs. (Eventually would need to do this for all drafts).

ietf-tools / datatracker