Closed tpendragon closed 3 years ago
There are three directories containing image files in the TIFF with identical file names:
$ ls -lt pudl0044/scrapbook3/foldouts/
total 452608
[...]
-rwxr-xr-x 1 deploy root 17775972 Oct 21 2002 DSC_0003.TIF
-rwxr-xr-x 1 deploy root 17780350 Oct 21 2002 DSC_0002.TIF
-rwxr-xr-x 1 deploy root 17770576 Oct 21 2002 DSC_0001.TIF
$ ls -lt pudl0044/scrapbook7/foldouts/
total 1897472
[...]
-rwxr-xr-x 1 deploy root 17779302 Oct 28 2002 DSC_0080.TIF
-rwxr-xr-x 1 deploy root 17775526 Oct 28 2002 DSC_0079.TIF
-rwxr-xr-x 1 deploy root 17774608 Oct 28 2002 DSC_0078.TIF
$ ls -lt pudl0044/zelda7/foldouts/
total 1880064
[...]
-rwxr-xr-x 1 deploy root 17774532 Oct 21 2002 DSC_0003.TIF
-rwxr-xr-x 1 deploy root 17770976 Oct 21 2002 DSC_0002.TIF
-rwxr-xr-x 1 deploy root 17774386 Oct 21 2002 DSC_0001.TIF
I'm uncertain as to how this should be reconciled, as the METS within scrapbook.mets
uses the same ID (pudl/pudl0044/831958/
) for both directories:
<mets:file CHECKSUM="5cd94bd02c37b414b74111248ccf96e0" CHECKSUMTYPE="MD5" ID="ycc7i" MIMETYPE="image/tiff">
<mets:FLocat LOCTYPE="URL" xlink:href="file:///mnt/diglibdata/pudl/pudl0044/831958/scrapbooks/scrapbook_01/00000136.tif"/>
</mets:file>
<mets:file CHECKSUM="f76c81cba59236ae9cbe9bb5e57bfa95" CHECKSUMTYPE="MD5" ID="gbw01" MIMETYPE="image/tiff">
<mets:FLocat LOCTYPE="URL" xlink:href="file:///mnt/diglibdata/pudl/pudl0044/831958/scrapbooks/scrapbook_02/00000001.tif"/>
</mets:file>
After further review on my part, these should likely just be ingested within the directories bearing bib. IDs in the PUDL directory.
Reviewing the MODS metadata, I found that the following attributes are not featured within the referenced MARC record (831958):
Field | Element | Language | Script | XPath | Value | Authorities/Encoding Standards | MARC Liberation JSON-LD Property | MARC Liberation JSON-LD Value |
---|---|---|---|---|---|---|---|---|
Title | titleInfo | English | Latin | mods:mods/mods:titleInfo/mods:title | Fitzgerald's Trimalchio | Not present | Not present | |
Alternative Title | titleInfo | English | Latin | mods:mods/mods:titleInfo/mods:title[type="alternative"] | Trimalchio | Not present | Not present | |
Title | titleInfo | English | Latin | mods:mods/mods:titleInfo/mods:title | Great Gatsby | NAF | Not present | Not present |
Author | namePart | English | Latin | mods:mods/mods:name/mods:role/mods:roleTerm[text()="aut"]/../../mods:namePart | F. Scott (Francis Scott) Fitzgerald 1896-1940 | author | Fitzgerald, F. Scott (Francis Scott), 1896-1940 | |
Type of Resource | typeOfResource | English | Latin | mods:mods/mods:typeOfResource | text | Not present | Not present | |
Date Created | dateCreated | English | Latin | mods:mods/mods:originInfo/mods:dateCreated | 1924-1925 | w3cdtf | date | 1897-1944. |
Language | language | English | Latin | mods:mods/mods:language/mods:languageTerm | language | eng | ||
Extent | extent | English | Latin | mods:mods/mods:physicalDescription/mods:extent | extent | 44 linear ft. (89 archival boxes, 11 oversize flat cases) | ||
Note | note | English | Latin | mods:mods/mods:note | Not present | Not present | ||
Subject | subject | English | Latin | mods:mods/mods:subject/mods:genre | Manuscripts | LCSH | type | Correspondence, Manuscripts |
Subject | subject | English | Latin | mods:mods/mods:subject/mods:name/mods:namePart | F. Scott (Francis Scott) Fitzgerald 1896-1940 | LCSH | Not present | Not present |
Collection | collection | English | Latin | mods:mods/mods:relatedItem[@type="host"]/mods:titleInfo/mods:title | F. Scott Fitzgerald papers, 1897-1944 | Not present | Not present | |
Use Rights | accessCondition | English | Latin | mods:mods/mods:accessCondition[@type="useAndReproduction"] | Selected items in the F. Scott Fitzgerald Papers can be photoduplicated at the expense of the researcher requesting photoduplication. Advanced estimates and payment are required. For general information on photoduplication and permissions, go to http://www.princeton.edu/~rbsc Requests to to reproduce, publish, or broadcast material from the F. Scott Fitzgerald Papers should be addressed Public Services staff, rbsc@princeton.edu The correct form of citation includes the name of the collection, box and folder numbers, and an indication that the originals are in the "Manuscripts Division, Department of Rare Books and Special Collections, Princeton University Library." The manuscript of The Great Gatsby and other writings of F. Scott Fitzgerald are not to be quoted, published, reproduced, or broadcast without the written permission of the Princeton University Library as owner of the physical object, and of the Fitzgerald Literary Trust (copyright holder), c/o Harold Ober Associates, 425 Madison Avenue, New York, New York 10017 (Telephone: 212-759-8600; FAX: 212-759-9428). The Library is not responsible for copyright infringement or other legal problems resulting from unauthorized publication of the words of F. Scott Fitzgerald. | Not present | ||
Access Restrictions | accessCondition | English | Latin | mods:mods/mods:accessCondition[@type="restrictionOnAccess"] | For legal and conservation reasons, access to F. Scott Fitzgerald’s original manuscripts (including corrected galleys and scrapbooks) is strictly restricted. Scottie Fitzgerald Lanahan, daughter of F. Scott Fitzgerald and Zelda Fitzgerald, donated the Fitzgerald Papers to the Princeton University Library in 1950, stipulating that surrogates of the original manuscripts were to be made available to researchers instead of the originals. This was done to preserve the originals, which are not on good paper. Originally, the surrogates were in the form of microfilm. A facsimile edition of The Great Gatsby autograph manuscript was published in 1973: The Great Gatsby: A Facsimile of the Manuscript, edited with an introduction by Matthew J. Bruccoli (Washington, D.C.: Microcard Editions Books, 1973). Facsimiles editions of other manuscripts of books and short stories followed a multi-volume series: F. Scott Fitzgerald Manuscripts, edited by Matthew J. Bruccoli and Alan Margolies (New York: Garland Publishing Company, 1990). Complete sets of the facsimile edition are available at more than 50 research libraries (including Firestone Library). The present digital surrogates of The Great Gatsby manuscript and corrected galleys are part of this effort and are being put online, using digital watermarks, with the permission of the Fitzgerald Literary Trust (the Fitzgerald copyright holder), c/o Harold Ober Associates, the New York literary agency. | Not present | ||
Abstract | abstract | English | Latin | mods:mods/mods:abstract | F. Scott Fitzgerald, This Side of Paradise, autograph manuscripts and corrected typescripts (1917-1919). Fitzgerald began writing This Side of Paradise at Princeton, continued in November 1917 at Fort Leavenworth, Kansas, with the working title "The Romantic Egoist," and completed a first draft of the novel at Cottage Club in March 1918. After this draft had been twice rejected by the New York publisher Charles Scribner's Sons, Fitzgerald returned to his parents' home at 599 Summit Avenue in his native St. Paul, Minnesota, and added five new chapters to the four he had written the previous year. He changed the second title of the novel from "The Education of a Personage" to "This Side of Paradise" and sent the novel to Maxwell Perkins at Scribner's. The publisher accepted the novel on September 16, 1919, and published it on March 26, 1920. The author's corrected galleys and page proofs do not survive. | Not present | Not present | |
Table of Contents | tableOfContents | English | Latin | mods:mods/mods:tableOfContents | I. This Side of Paradise (1920). | Not present | Not present |
What is unclear to me is whether this should be resolved by providing the additional information in a separate MARC record and linking to that during ingestion, or whether or not this should be parsed from the METS/MODS (please see #1705)
Currently the watermarked image files in the TIFF are located within the following directories on the Samba network share:
$ ls -lt //[HOST].princeton.edu/pudl/pudl0044/823463/
total 3152896
-rwxr-xr-x 1 deploy www-data 103119252 May 22 2013 00000009.tif
...whereas the original files can be found within a separate directory originals_no_watermark
:
$ ls -lt //[HOST].princeton.edu/pudl/pudl0044/originals_no_watermark/823463/
total 3152896
-rwxr-xr-x 1 deploy www-data 111132340 May 14 2013 00000030.tif
bundle exec rake bulk:ingest DIR=staged_files/pudl/pudl0044/823463 BIB=823463 REPLACES=pudl0044 COLL=b9436097-6999-475a-a77c-c664b4d67607
followed by
bundle exec rake bulk:ingest_intermediate_files DIR=staged_files/pudl/pudl0044/originals_no_watermark
Successfully ingests the material and appends the images in the TIFF without watermarks as intermediary files.
Further testing today reveals that the TIFF files are not ingested as intermediate or original files, and that the newly generated intermediate JP2 files are not accessible:
Valkyrie::StorageAdapter::FileNotFound in DownloadsController#show
[...]
trace("valkyrie.storage.find_by") do |span|
span.set_tag("param.id", id.to_s)
storage_adapter.find_by(id: id)
end
end
These errors now seem to be inconsistent, and are often remedied by invoking binding.pry
during the Rake task, after the jobs have been completed. Hence, it may be related to a race condition introduced in #1730
Following the merging of #1913, test materials are ingested as expected when the following Rake tasks are invoked:
bundle exec rake bulk:ingest DIR=staged_files/pudl/pudl0044/originals_no_watermark/823463 BIB=823463 REPLACES=pudl0044 COLL=d2851940-20b5-4ea8-9cb9-216be2738a3c
bundle exec rake bulk:ingest_intermediate_files DIR=staged_files/pudl/pudl0044/
These appear to ingest properly within the staging environment, however the metadata for 831958 seem to be invalid:
This is also the case for 831959:
Additionally, the derivative generation consistently fails for the member ScannedResource
"scrapbooks":
all have been migrated
Notes: MARC, but needs custom watermarks on JP2s - Intermediate TIFF solution?
The referenced intermediate TIFF solution is to allow Scanned Resources to have an Intermediate Tiff uploaded which JP2s should be created off of - in this case the intermediate tiffs will be watermarked.