pulibrary / figgy

Valkyrie-based digital repository backend.
Other
35 stars 4 forks source link

Ingest "Ivy Lee" (pudl0036) #1586

Closed tpendragon closed 5 years ago

tpendragon commented 5 years ago

Note: MODS (series level desc in a finding aid) Depends on #1674 Depends on #1675

tpendragon commented 5 years ago

Files: https://drive.google.com/drive/u/0/folders/0B4Wo5hgOEFY3ZlZGUU5OS3FIbDA

tpendragon commented 5 years ago

PUDL page: http://pudl.princeton.edu/collections/pudl0036

jrgriffiniii commented 5 years ago

The MODS Documents in the METS files currently have minted ARKs:

   <mods:relatedItem type="host">
      <mods:titleInfo>
         <mods:title>Ivy Ledbetter Lee Papers, 1881-1989 (bulk 1915-1946)</mods:title>
      </mods:titleInfo>
      <mods:location>
         <mods:url note="Finding Aid">http://arks.princeton.edu/ark:/88435/m039k489x</mods:url>
      </mods:location>
   </mods:relatedItem>
jrgriffiniii commented 5 years ago

The series referenced within the MODS will likely need to be parsed and concatenated in order to produce a single string:

   <mods:relatedItem type="series">
      <mods:titleInfo>
         <mods:nonSort>The</mods:nonSort>
         <mods:title>Subway Sun</mods:title>
         <mods:partNumber>Volume 3</mods:partNumber>
         <mods:partNumber>Number 6</mods:partNumber>
      </mods:titleInfo>
   </mods:relatedItem>
jrgriffiniii commented 5 years ago

Subject authorities are names which may also have to be parsed:

   <mods:subject authority="lcsh">
      <mods:name type="personal">
         <mods:namePart type="family">Lee</mods:namePart>
         <mods:namePart type="given">Ivy L. (Ivy Ledbetter)</mods:namePart>
         <mods:namePart type="date">1877-1934</mods:namePart>
      </mods:name>
   </mods:subject>
jrgriffiniii commented 5 years ago

The MODS metadata has the following structure:

Field Element Language Script XPath Example Authorities/Encoding Standards
Title titleInfo English Latin mods:mods/mods:titleInfo/mods:title Our "Surplus" is not in Cash  
Creator namePart English Latin mods:mods/mods:name/mods:roleTerm[text()="cre"]/../mods:namePart Interborough Rapid Transit Company  
Resource Type typeOfResource English Latin mods:mods/mods:typeOfResource still image  
Genre genre English Latin mods:mods/mods:genre Posters AAT
Place place English Latin mods:mods/mods:originInfo/mods:place/mods:placeTerm New York (N.Y.)  
Date Created dateCreated N/A N/A mods:mods/mods:originInfo/mods:dateCreated 1920-02  
Language languageTerm N/A N/A mods:mods/mods:language/mods:languageTerm eng  
Extent extent English Latin mods:mods/mods:physicalDescription/mods:extent 1 poster; approximately 21 × 16 inches  
Subject subject English Latin mods:mods/mods:subject/mods:topic Subways LCSH
Subject subject English Latin mods:mods/mods:subject/mods:topic Posters AAT
Collection collection English Latin mods:mods/mods:relatedItem[@type="host"]/mods:titleInfo/mods:title Ivy Ledbetter Lee Papers, 1881-1989 (bulk 1915-1946)  
Use Rights accessCondition English Latin mods:mods/mods:accessCondition[@type="useAndReproduction"] Single photocopies may be made for research purposes. Permission to publish materials from the collection must be requested from the Curator of the Public Policy Papers. Researchers are responsible for determining any copyright questions  
Access Restrictions accessCondition English Latin mods:mods/mods:accessCondition[@type="restrictionOnAccess"] The collection is open for research  
tpendragon commented 5 years ago

@jrgriffiniii Regarding your ARK statement, the records probably do have ARKs but that particular one is for the finding aid it references: https://findingaids.princeton.edu/collections/MC085.

tpendragon commented 5 years ago

In fact, this one seems to be described at the item level. See https://findingaids.princeton.edu/collections/MC085/c1274.xml

We might be able to do this one if we match up the component IDs.

tpendragon commented 5 years ago

The MODS here seems straight-forward, good candidate for first ingest.

tpendragon commented 5 years ago

These are ingesting into prod now: https://figgy-staging.princeton.edu/?f%5Bmember_of_collection_titles_ssim%5D%5B%5D=Interborough+Rapid+Transit+Company+Subway+Posters

Small problem: The UUID in the METS is not the same UUID to access the object in PUDL.

For instance, http://pudl.princeton.edu/objects/52e84d81-5abc-4f0c-9b25-97d21c5c94e4 has a METS file which has the UUID d269d0b8-1e17-47f5-97c3-696f0f914eac. The ID 52e84d81-5abc-4f0c-9b25-97d21c5c94e4 doesn't appear to be in the METS file at all.

tpendragon commented 5 years ago

To get a title/UUID combo I'll have to write a script using the following info from @jpstroop:

http://pudl.princeton.edu/rest/compiled/pudl0036
e.g. http://pudl.princeton.edu/rest/compiled/pudl0036/135
http://pudl.princeton.edu/rest/compiled/pudl0036/135/021.mets
tpendragon commented 5 years ago

These are fixed now thanks to @jpstroop. Do we want to mark them complete? Have someone look them over? Change the collection slug? @joycebcat ?

tpendragon commented 5 years ago

They can be found here: https://figgy.princeton.edu/?f%5Bmember_of_collection_titles_ssim%5D%5B%5D=Interborough+Rapid+Transit+Company+Subway+Posters&q=

jpstroop commented 5 years ago

@alexisantracoli is probably the person who should give them an 👀, as this is a Mudd collection.

tpendragon commented 5 years ago

All marked complete!

alexisantracoli commented 5 years ago

Thank you Trey!