opencitations / oc_meta

ISC License
8 stars 5 forks source link

OMIDs for agent roles instead of responsible agents in Meta CSV "author" field #26

Open eliarizzetto opened 4 months ago

eliarizzetto commented 4 months ago

In version 8 (https://doi.org/10.6084/m9.figshare.21747461.v8) and version 9 (https://doi.org/10.6084/m9.figshare.21747461.v9) of the OpenCitations Meta CSV dump, the author field of some of the resources erroneously contains OMIDs of agent roles (prefixed by "/ar") instead of OMIDs of responsible agents (prefixed by "/ra"). For example, the following row, storing metadata for br/06602041963, contains three agent roles in the author field:

id title author issue volume venue page pub_date type publisher editor
omid:br/06602041963 doi:10.33029/9704-6031-3-2021-1-432 openalex:W4244840417 CLINICAL PHARMACOLOGY. Obstetrics. Gynecology. Infertile Marriage [omid:ar/06609023674]; [omid:ar/06609023675]; [omid:ar/06609023673] CLINICAL PHARMACOLOGY. Obstetrics. Gynecology. Infertile Marriage [omid:br/06602042953] 1-432 2021 book chapter Geotar-Media Publishing Group [omid:ra/0610116993 crossref:18453] Radzinskiy, E.V. [omid:ra/06606217946]; Shykh, E.V. [omid:ra/06606217947]

CSV rows with a faulty value in the the author field are 7,607,734 in version 8 and 8,105,378 in version 9. Such errors in the CSV files are not observable when the same data is accessed via API (see e.g. https://opencitations.net/meta/api/v1/metadata/omid:br/06602041963).

Based on the study of randomly sampled cases (including the one mentioned above), my guess is that the following conditions hold also for the rest of the rows interested by the phenomenon: