pulibrary / pdc_discovery

Princeton Data Commons discovery portal for Research Data
10 stars 0 forks source link

Bug: Duplicate record from DataSpace and PDC #548

Closed astrochun closed 7 months ago

astrochun commented 7 months ago

Hello, previously we migrated the following DataSpace record:

However, we find two versions of this in Discovery: From DataSpace indexing: https://datacommons.princeton.edu/discovery/catalog/150055

From PDC Describe: https://datacommons.princeton.edu/discovery/catalog/doi-10-11578-1888261

Both refer to the same ARK.

I noticed this because when I selected "Princeton Plasma Physics Laboratory" from the Discovery "Community" facet, there was a "ITER and Tokamaks" Community with one record.

Searching by title, it's visible that there are two records: https://datacommons.princeton.edu/discovery/?search_field=title&q=Verification%2C+validation%2C+and+results+of+an+approximate+model+for+the+stress+of+a+Tokamak+toroidal+field+coil+at+the+inboard+midplane

My understanding is that PDC metadata takes precedence over DataSpace. It seems as though this one was done in error.

carolyncole commented 7 months ago

@astrochun I think the issue is the arks are not exactly the same

Yes it is the "same", but the first one has an extra ark: in the url

http://arks.princeton.edu/ark:/88435/dsp01rb68xg060 http://arks.princeton.edu/88435/dsp01rb68xg060

Both URLs go to a the different records in PDC Discovery. I would assume that there is a typo in one of those urls?

astrochun commented 7 months ago

Thanks for pointing out the problem @carolyncole. I've updated the metadata for the second record and will check again later to see if Discovery resolve this issue automatically with the correct ARK. Will wait to close this issue until then.

astrochun commented 7 months ago

Issue appears to have resolved itself after the metadata fixed.