pulibrary / pdc_discovery

Princeton Data Commons discovery portal for Research Data
10 stars 0 forks source link

Downloadable report for OSTI reporting #355

Closed astrochun closed 1 year ago

astrochun commented 2 years ago

PPPL needs to provide a list of their datasets to OSTI on a regular basis. (E-Link 241.6 submissions)

PDC Discovery should provide an easy way to download a set of search results as a json formatted report. We propose a PPPL-specific dedicated JSON endpoint in the discovery application (similar to this).

Acceptance criteria

carolyncole commented 1 year ago

@astrochun just a clarification utilizing an example: https://datacommons.princeton.edu/discovery/catalog/133856

carolyncole commented 1 year ago

@astrochun What time schedule are you looking for this? After dataspace is fully migrated and turned off, or before?

astrochun commented 1 year ago

@astrochun What time schedule are you looking for this? After dataspace is fully migrated and turned off, or before?

Would like this to happen before it's turned off. This will ensure that we have two ways to retrieve the complete metadata.

carolyncole commented 1 year ago

@astrochun Do you have an example of a report you have already gotten from Dspace? It would be helpful to see what you are getting.

astrochun commented 1 year ago

@astrochun Do you have an example of a report you have already gotten from Dspace? It would be helpful to see what you are getting.

This is what dspace-osti dumps from the DataSpace JSON API: https://raw.githubusercontent.com/pulibrary/dspace-osti/main/data/dspace_scrape.json

kelynch commented 1 year ago

The above payload is pulled currently from the DataSpace API endpoint defined here.

The collection IDs used to populate that string are defined in this array.

astrochun commented 1 year ago

A catalog query for PPPL: https://datacommons.princeton.edu/discovery/?f%5Bcommunity_root_name_ssi%5D%5B%5D=Princeton+Plasma+Physics+Laboratory&q=&search_field=all_fields&fl=*&format=json

bess commented 1 year ago

Per discussion with @astrochun, he now has what he needs to evaluate what the reporting process might look like. He's going to play around with what's been implemented already and think about whether this is the right direction, or whether we should instead try to generate a feed of the datacite JSON (as opposed to the indexed PDC Discovery data, which is what we've used so far).

bess commented 1 year ago

Screen shot of this feature with the full PDC Describe JSON exposed for harvesting: Screenshot 2023-03-30 at 11 30 15 AM