sul-dlss-labs / rialto-airflow

Airflow for harvesting data for research intelligence and open access analysis
Apache License 2.0
1 stars 0 forks source link

Investigate OpenAlex multiple Authors for an ORCID #82

Open lwrubel opened 2 months ago

lwrubel commented 2 months ago

Logs for the openalex_harvest_dois task show that there are sometimes multiple authors returned for an ORCID. The code currently proceeds to look up DOIs for the first author returned.

Example warning: found more than one openalex author id for 0000-0001-7586-8240

The lookup for the ORCID above returns authors https://openalex.org/A5013127948 and https://openalex.org/A5102787736. The second author ID only has 7 works.

jacobthill commented 2 months ago

Those two author ids seem to be the same person. You wrote: "The code currently proceeds to look up authors for the first author returned." Did you mean: "The code currently proceeds to look up publications for the first author returned."? If so, that might be reasonable. Maybe we should look into the logs to see how often this happens and make the code harvest for all authors but it may not be worth the effort.

lwrubel commented 2 months ago

Oops, yes, I meant "look up DOIs for the first author". I edited the issue.

Agree, let's see how many and do some analysis on how this changes the DOI set depending on how we handle it.