metanorma / bipm-si-brochure

SI Brochure edition 9, semantic encoded version (WARNING: DRAFT)
3 stars 0 forks source link

BIPM requested fixes 10: "Milton J" should be "Milton M" #230

Closed ronaldtse closed 1 month ago

ronaldtse commented 8 months ago

From Michael Stock:

The very last reference (no. 112 in the English text, no.111 in the French text) has a typo: it should be Milton M not Milton J.

anermina commented 8 months ago

The fetched value is as follows (BIPM Metrologia 56 2 022001):

      <name>
        <forename language="en" script="Latn" initial="J">Martin</forename>
        <forename initial="T"/>
        <surname language="en" script="Latn">Milton</surname>
      </name>

Ping @andrew2net

opoudjis commented 8 months ago

forename/@initial is the canonical initial of that forename, not another one. I use it if present, but J does not stand for Martin. So if the name is Martin J Milton, I would expect to see

<forename>Martin</forename>
<forename initial="J"/>
<surname>Milton</surname>

or at most

<forename initial="M">Martin</forename> # but by default I use the first letter anyway
<forename initial="J"/>
<formatted-initials>M</formatted-initials> # do not use M J
<surname>Milton</surname>
andrew2net commented 8 months ago

@ronaldtse I've checked the source dataset. It has inconsistency in person names:

I don't see staring way to correct parse all the cases. Isn't it a good idea just concatenate given name and surname save them as a full name string?

ronaldtse commented 8 months ago

@andrew2net for "source dataset" did you mean the IOP Metrologia XML, which has that article like this?

The source XML is:

<contrib contrib-type="author" xlink:type="simple">
  <contrib-id authenticated="false" contrib-id-type="orcid">0000-0002-8174-2211</contrib-id>
  <name name-style="western">
    <surname>Milton</surname>
    <given-names>Martin J T</given-names>
  </name>
  <xref ref-type="aff" rid="affiliation01">1</xref>
</contrib>

So this means that there is a problem with parsing source XML -- is it this issue?

andrew2net commented 8 months ago

@ronaldtse I've updated the parser to solve this issue. This case is not a big problem. To be sure that the update won't cause other problems I've checked the what names there are in the rawdata-bipm-metrologia dataset. It revealed that there are many others problem with names consistency in the dataset. So the question is: should I update the parser to solve only this issue or it's better to not parse names' parts and save them as fullnames?

The relaton/relaton-bipm#2 issue is about documents duplications. I'll answer to your comment in the issue.

So this means that there is a problem with parsing source XML -- is it this issue?

ronaldtse commented 8 months ago

@andrew2net

I've checked the what names there are in the rawdata-bipm-metrologia dataset. It revealed that there are many others problem with names consistency in the dataset.

Can you point out what the inconsistencies are?

@MStock78120 and @jmilesBIPM would be interested in finding out the issues with the bibliographic encodings at Metrologia.

Thanks.

andrew2net commented 8 months ago

Some examples are here in format "given name", "surname" file:

You can see that name parts can be all in surname or distributed between surname and given name unpredictably. Affixes can be capitalized or not. Even if we managed to make rules to parse forename, surname, initials, prefixes, and additions, I doubt that we'll be able to restore original name form the parts. I think if we use given-name + surname string as full name, it will give us original name. We have fullname element in our data model to keep the string.

FYI since the issue we use branch 2023-04-23

ronaldtse commented 6 months ago

@jmilesBIPM it seems that the Metrologia is the culprit -- can we request them to update this data to correct the names?

jmilesBIPM commented 6 months ago

The online version of the article in question (https://doi.org/10.1088/1681-7575/ab0013) displays the full name (given name plus surname) of all the authors correctly: e.g. "Martin J T Milton" image

Isn't it possible for you just to display these two fields as provided in the XML files?

I see in the first message from @anermina that the XML gave

  <name>
        <forename language="en" script="Latn" initial="J">Martin</forename>
        <forename initial="T"/>
        <surname language="en" script="Latn">Milton</surname>
      </name>

This is admittedly more long winded than one might expect, but the answer seems to be

forename + forename initial given with forename + forename initial given separately + surname

= Martin J T Milton

andrew2net commented 6 months ago

@ronaldtse why don't we just save "given_name" + "surname" as a "fullname"? I don't think it's possible to parse all the names correctly.

jmilesBIPM commented 6 months ago

In that case we'd have "Martin T Milton" here? Close enough!

andrew2net commented 6 months ago

@jmilesBIPM in the source we have

<name name-style="western">
  <surname>Milton</surname>
  <given-names>Martin J T</given-names>
</name>

So "given-name" + "surname" will be "Martin J T Milton".

@ronaldtse I just noticed that there is "name-style" attribute. May be we can use it to parse name parts correctly.

andrew2net commented 1 month ago

There is only "western" name style. Names now are created by concatenating "given-name" and "surname". The name now is "Martin J T Milton".