Closed hectorcorrea closed 1 year ago
I can detect the Princeton ARK in a DataSpace record by looking at the URIs in the record and selecting the one that points to http://arks.princeton.edu/ark:/<something>
. See https://github.com/pulibrary/pdc_discovery/pull/382/files#diff-582cacf1bdf6f627c03c544c6eca40d81f805302487d91d345422ca194bae019R20-R27
How can I detect the DOI in a DataSpace record?
https://doi.org/10.34770/<whatever>
) ? Yeshttps://doi.org/<whatever>
) ? No Keep in mind this logic is to detect that a record from PDC Describe is the same as a record from DataSpace.
Sample DataSpaces record with
It seems that PPPL records have ARKs but not DOI, example: https://dataspace.princeton.edu/handle/88435/dsp012n49t492r
Update the DataSpace indexer (https://github.com/pulibrary/pdc_discovery/blob/main/lib/traject/dataspace_research_data_config.rb) to ignore records already imported from PDC Describe. This is to give precedence to PDC Describe records over DataSpace records once we start migrating DataSpace records to PDC Describe.
The match should be done via ARK or DOI (as indicated on https://github.com/pulibrary/pdc_discovery/issues/340)
One way to get this done in Traject is on the
each_record
block of the Traject config, for example: