pulibrary / pul-it-handbook

Princeton Univ. Library Apps best practices and recommendations
BSD 3-Clause "New" or "Revised" License
10 stars 1 forks source link

Document the DataSpace ProQuest dissertation metadata updates #53

Closed jrgriffiniii closed 1 year ago

jrgriffiniii commented 4 years ago

Currently, there exists a workflow in DataSpace (a DSpace implementation resolving to ) in which metadata for doctoral dissertations is provided by ProQuest as an external service provider. The DSpace administrator (currently @jrgriffiniii) transforms the provided metadata using a command-line Java Class invocation (this cleans and normalizes titles, along with retrieving some existing ARKs for items). Following this cleaning and linking process, @pmgreen has traditionally provided additional cleaning tasks, along with providing linking for any missing ARKs. The final version of these MARC records are then delivered for importation into the ILS.

This process is undertaken approximately every six months, and we should strive to adjust this in order to automate this process where possible. For the moment, this process should be documented in greater detail.

jrgriffiniii commented 4 years ago

While discussing this workflow with @mzelesky, it was discovered that the Java-implemented routine which transforms the ProQuest metadata into proper MARC records was generating improperly-formatted MARC. Further, justified concerns regarding why this transform and process should be so tightly-bound to Java (particularly given that so many within the ITIMS teams work with Ruby Gems in order to interface with the ILS) were raised.

Perhaps a more streamlined approach could be undertaken which relies more heavily upon Ruby-based solutions could instead be explored (as the problematic MARC record formatting must be addressed regardless).