Closed astrochun closed 1 year ago
@astrochun just a clarification utilizing an example: https://datacommons.princeton.edu/discovery/catalog/133856
@astrochun What time schedule are you looking for this? After dataspace is fully migrated and turned off, or before?
@astrochun What time schedule are you looking for this? After dataspace is fully migrated and turned off, or before?
Would like this to happen before it's turned off. This will ensure that we have two ways to retrieve the complete metadata.
@astrochun Do you have an example of a report you have already gotten from Dspace? It would be helpful to see what you are getting.
@astrochun Do you have an example of a report you have already gotten from Dspace? It would be helpful to see what you are getting.
This is what dspace-osti
dumps from the DataSpace JSON API: https://raw.githubusercontent.com/pulibrary/dspace-osti/main/data/dspace_scrape.json
The above payload is pulled currently from the DataSpace API endpoint defined here.
The collection IDs used to populate that string are defined in this array.
Per discussion with @astrochun, he now has what he needs to evaluate what the reporting process might look like. He's going to play around with what's been implemented already and think about whether this is the right direction, or whether we should instead try to generate a feed of the datacite JSON (as opposed to the indexed PDC Discovery data, which is what we've used so far).
Screen shot of this feature with the full PDC Describe JSON exposed for harvesting:
PPPL needs to provide a list of their datasets to OSTI on a regular basis. (E-Link 241.6 submissions)
PDC Discovery should provide an easy way to download a set of search results as a json formatted report. We propose a PPPL-specific dedicated JSON endpoint in the discovery application (similar to this).
Acceptance criteria