Closed kevinreiss closed 1 year ago
What is the urgency of this ticket? Are there people waiting for this data currently, or is it more of a "nice-to-have"?
What kind of user experience is expected for this issue? Would this require new UI work? What kind of technical expertise do we expect a user to need to have to use this feature? See also https://github.com/pulibrary/orangelight/issues/2825
I think this falls into the due diligence category of work. We'd like to support an easy way to access this data to meet the goal of folks who might want to utilize our Open data. The best and most logical way for folks to get this data consistently would be from a public interface provided by the POD project directly, but I don't think the decision to make that open is happening any time soon. I think the goal for us to close OL#2825 and this ticket would be to have a stable URL someone could be pointed at where they can download a compressed set of our most recent full dump of data in marcxml.
We discussed in our stand up 11/17/2022 to work on this ticket after Thanksgiving.
@kevinreiss will discuss with @escowles to have one archived full dump in a different page in lib_jobs or via the POD project page.
No plans in the forseeable future to open the POD project apis to data consumers outside of the POD project.
Discuss possibly using the POD API to grab a set of the current MARC data we are exposing as "Open Data". Two options in the API:
One of these approaches could be an alternative to working with a full dump event of the current POD publishing process or creating a new publishing process.
Jane and I created a decision document to try to come to some conclusions on some of the questions that have come up around this issue. Contact Max or Jane if you cannot edit and want to.
This work is complete! @kevinreiss will open a new issue for refreshing the data regularly.
As a Library Community Member I would like to download a compressed version of Princeton's publicly shareable MARC data for analysis and potential re-use.
Concrete example: I would like download a set of MARC in order to analyze the Princeton collection's coverage in certain subject areas.
Notes We currently have a link to our bibdata dump files on this page https://catalog.princeton.edu/dataset. The link there should be replaced to a location where you can download the compressed POD data set. Ideally we'd also display the date the dump was generated on and refresh the data every month.
Questions