Once the initial DOI collection process is complete we know the population of DOIs we are working with in the dataset. We are also able to map the DOI to a SUNETID using either the orcidid (for Dimensions and OpenAlex) or cap_profile_id (for sul_pub).
The new doi_sunet task will create a mapping of doi -> [sunetid] using the pickle files, sul_pub csv and the authors csv. This is then used by the doi_set task to generate the list of DOIs needed for harvesting.
Once the publications datasets are merged the new contribs task uses the doi_sunet mapping to add the sunetid column, split out the publications into contributions where each row has a unique sunetid. Finally the contributions are joined with the authors.csv.
Once the initial DOI collection process is complete we know the population of DOIs we are working with in the dataset. We are also able to map the DOI to a SUNETID using either the
orcidid
(for Dimensions and OpenAlex) orcap_profile_id
(for sul_pub).The new
doi_sunet
task will create a mapping ofdoi -> [sunetid]
using the pickle files, sul_pub csv and the authors csv. This is then used by thedoi_set
task to generate the list of DOIs needed for harvesting.Once the publications datasets are merged the new
contribs
task uses thedoi_sunet
mapping to add thesunetid
column, split out the publications into contributions where each row has a uniquesunetid
. Finally the contributions are joined with theauthors.csv
.