sul-dlss / exhibits

Stanford University Libraries online exhibits showcase
https://exhibits.stanford.edu
Other
19 stars 7 forks source link

SALT indexing: physicalLocation (was: series, box, folder) #51

Closed ndushay closed 8 years ago

ndushay commented 9 years ago

In existing Saltworks, series, subseries, box and folder are separate, independent facets:

facets

This information is represented in DOR mods as a single string in a physicalLocation element that has no attributes (there may be multiple physicalLocation strings):

http://purl.stanford.edu/jz524jt5425.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession: 1986-052, Box: 15, Folder: 2</physicalLocation>
  </location>

http://purl.stanford.edu/dk023cx0427.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession: 1986-052, Box: 47, Folder: 20</physicalLocation>
  </location>

http://purl.stanford.edu/gw545by8468.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession 2005-101</physicalLocation>
  </location>

The Series facet value is the number after "Accession" -- Saltworks shows 2 series values for the entire collection.

The Box and Folder facet values are sometimes missing, but are also in the same string, when present.

ndushay commented 9 years ago

some other collections with this info:

  <location>
   <physicalLocation>Series 3, Box 30, Folder 3</physicalLocation>
  </location>
   <mods:relatedItem type="host">
      <mods:location>
        <mods:physicalLocation type="location">Series 15 | Box 1 | Folder 6</mods:physicalLocation>
     </mods:location>
   </mods:relatedItem>
  <mods:location>
   <mods:physicalLocation>Collection: M0690; Box: 7; Folder: 54; Item: 267</mods:physicalLocation>
  </mods:location>
 <mods:identifier type="local" displayLabel="Item Number">267</mods:identifier>
 <mods:identifier type="local" displayLabel="Collection ID">M0690</mods:identifier>
….
 <mods:identifier type="local" displayLabel="SU DRUID">druid:bc684nn7682</mods:identifier>
 <mods:identifier displayLabel="Box" type="local">7</mods:identifier>
 <mods:identifier displayLabel="Folder" type="local">54</mods:identifier>
ndushay commented 9 years ago

The Series facet value is the number after "Accession" -- Saltworks shows 2 series values for the entire collection.

series facet values

http://purl.stanford.edu/jz524jt5425.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession: 1986-052, Box: 15, Folder: 2</physicalLocation>
  </location>

http://purl.stanford.edu/dk023cx0427.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession: 1986-052, Box: 47, Folder: 20</physicalLocation>
  </location>

http://purl.stanford.edu/gw545by8468.mods

  <location>
    <physicalLocation>Call Number: SC0340, Accession 2005-101</physicalLocation>
  </location>
ndushay commented 9 years ago

The Box facet values are not quite all numeric (is this because the parsing for the saltworks indexing encountered some unexpected punctuation delimiters?):

box facet values

but the folder facet values are all numeric

ndushay commented 9 years ago

Here is a doc with a saltworks Box facet value of "Box : 39" that has been accessioned into DOR:

http://purl.stanford.edu/dg509nb5103.mods

  <physicalLocation>Call Number: SC0340, Accession 2005-101, Box : 39, Folder: 9</physicalLocation>
ndushay commented 9 years ago

Ideally, the Box/Folder values would be indexed as a hierarchical facet. Spotlight does not currently support hierarchical facets.

@cbeer made the suggestion that perhaps we can index box and folder (and accession number?? @lauraw15 can you help us figure this out?) together as a single string, rather than as independent facets, if we can't do hierarchical facets quickly enough.

ndushay commented 9 years ago

As FYI, in old salt, this is derived from dc:coverage in the 'zotero' datastream from salt fedora:

https://github.com/sul-dlss/salt/blob/0a8d7b88642ebeb2133613bf2f4d5f219c08dd6b/lib/stanford/salt_document.rb#L215-L226

# take a string from the coverage and returns a formated hash. 
    # coverage string values look like : Box: 36, Folder: 15, Title: HPP Papers, Various Authors (1 of 2)1970 -
    # returns hash { box => 36, folder => 15, subseries => HPP Papers, Various Authors (1 of 2)1970 - }. 
    # title in this case is the section unittitel from the EAD. 
    def format_coverage(coverage_string)
      coverage_string.gsub!("\n", "")
      coverage_hash = { "subseries" => [coverage_string.split("Title:")[1].to_s.strip] }
      parts = coverage_string.split(",")
      coverage_hash["box"] = [parts.shift.gsub("Box:", '').to_s.strip]
      coverage_hash["folder"] =  [parts.shift.gsub("Folder:", '').to_s.strip]
      coverage_hash
    end

when I grab an actual zotero datastream from bs912ky4641 in salt fedora, I see:

  <dc:coverage>Box: 20, Folder: 44, Title: Schlumberger</dc:coverage>

looking at the same object's MODS in purl (http://purl.stanford.edu/bs912ky4641.mods) :

  <location>
   <physicalLocation>Call Number: SC0340, Accession: 1986-052, Box: 20, Folder: 44</physicalLocation>
  </location>
  <note type="preferred citation">Call Number: SC0340, Accession: 1986-052, Box: 20, Folder: 44, Title: Schlumberger</note>

Yum, yum.

peetucket commented 8 years ago

From Scott on October 23:

These [facets] are all important.

You obviously already know this, but anyway... Folders (number) are in Boxes (number) , and Boxes are in Series. Series/Box/Folder together identifies a unique Folder. "Subseries", on the other hand, is actually the "Name" of the Folder (as identified by the archivist). So the Subseries is 1-to-1 with each unique Folder number. -- perhaps the nomenclature for "subseries" might be updated to "folder name" or something similar.

Being able to reconstruct the physical location of the documents and poking around in the same folder and box can be very helpful in figuring out what the meaning and context of handwritten papers and drafts might be, especially when the other metadata and OCR text is too skimpy for the regular search tools.

ggeisler commented 8 years ago

I like using "Folder name" instead of "Subseries" for the label. Whether we rename it in the exhibit admin UI (as the curator) or otherwise.

ndushay commented 8 years ago

I am closing this ticket; I think the remaining pieces ("subseries" aka folder name: #52) (display field spotlight-dor-resources/issues#18) are captured elsewhere.