pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Ingest "Postcards" (pudl0009) #1580

Closed tpendragon closed 6 years ago

tpendragon commented 6 years ago

Note: MODS

@jpstroop I assume "MODS" means there's no bib records? Depends on #1674 Depends on #1675

tpendragon commented 6 years ago

Files: https://drive.google.com/drive/u/0/folders/0B4Wo5hgOEFY3ZlZGUU5OS3FIbDA

tpendragon commented 6 years ago

http://pudl.princeton.edu/collections/pudl0009

tpendragon commented 6 years ago

The METS is separated into 5 directories. Each directory is a "series", which is referenced by the MODS. Each series is described in a finding aid: https://findingaids.princeton.edu/collections/AC045.xml

There's no item-level descriptions in the finding aid - there's "parts" of boxes.

Field Example
title (non-Sort) The
title Chapel, Princeton University
relatedItem Princeton University Historical postcard collection Series 1 Buildings, circa 1900-1960 (separated into a title and two sub-titles
publisher Published by H. M. Hinkson, Stationer, Princeton, N. J. & The Albertype Co., Brooklyn, N.Y.
publisher date 1941
language eng
abstract Exterior view of the chapel in black and white. Divided back postcard.
genre (aat) Picture postcards
subject (aat) Eye-level views, Exterior views
subject (lcsh) Princeton University, Buildings, Princeton University. Chapel
location Princeton University Library. Department of Rare Books and Special Collections. Seeley G. Mudd Manuscript Library.
tpendragon commented 6 years ago

When migrating sort title, migrate into two fields: "Sort Title" and "Title." Title would be "The Chapel, Princeton University" and sort title would be "Chapel, Princeton University."

tpendragon commented 6 years ago

Map location to our controlled vocab.

tpendragon commented 6 years ago

This one seems pretty straight forward, as long as we don't have to link publisher to published date.

jpstroop commented 6 years ago

Are we controlling publisher? I could imagine a single publication statement that's just a string (could include both publishers, dates and locations in whatever format, all in one field) and then separate controlled fields for publisher (just another name with a role of publisher), date(s), and location.

christinach commented 6 years ago

Some files include more than one originInfo:

<originInfo>
      <publisher>Published by H. M. Hinkson, Stationer, Princeton, N. J.</publisher>
 </originInfo>
 <originInfo>
      <publisher>The Albertype Co., Brooklyn, N. Y.</publisher>
</originInfo>

In this case do we want to pull publication information from the first node or from all?

jpstroop commented 6 years ago

Definitely need both. Is publisher (or what ever it is) a repeatable field?

christinach commented 6 years ago

@jpstroop based on your previous comment my understanding was to add a publication field and pull the publisher, dates and locations information from the originInfo node. I'm not sure I understand '..Is publisher a repeatable field..'

jpstroop commented 6 years ago

I don't know how things are being modelled, but this record essentially had two publication statements, which are really just strings, and I think could either store two strings: publisher: ["s1, "s2"] or if that isn't possible then mash them together as one string and separate them with ;, i.e.: publisher: ["{s1} ; {s2}"].

I would think that the date would be stored separately?

christinach commented 6 years ago

I can add only the publisher information. The dateCreated is stored separately.

tpendragon commented 6 years ago

@christinach For now I'd just make two fields "publisher" and "published date." Extract the publisher name part to publisher and the date part of the publication to published date. (There's also a dateCreated, but that should be handled already right?)

christinach commented 6 years ago

yes, dateCreated is stored already. I'm assuming dateOther is the published Date.

tpendragon commented 6 years ago

@christinach I think that's accurate. There's also a <copyrightDate> it looks like.

tpendragon commented 6 years ago

1/0196.mets also has a dateIssued

christinach commented 6 years ago

yes there is a copyrightDate

christinach commented 6 years ago

@tpendragon thank you, I will add all the relevant dates.

christinach commented 6 years ago

The folder pudl0009 includes a folder 'uncat' with tif, img and png files. @roelmoon said that the png and jpg can be ignored (or deleted).