Open hannahfrost opened 7 years ago
Image is definitely there and can be viewed and downloaded via the "Actions" dropdown or on the fileset page: https://stlawu.pilot.hykudirect.org/concern/parent/0a13e77d-b826-46b7-b5c1-5a2ee20e6cdf/file_sets/96c4f61a-c4c9-4d98-8f00-34b3974d1bc4
So this is just a problem w/ UV integration.
The JS console has two errors:
Uncaught Error: See almond README: incorrect module build, no module name
at define (application-a558d44….js:24)
at application-a558d44….js:29
at application-a558d44….js:29
AND
GET https://stlawu.pilot.hykudirect.org/images/96c4f61a-c4c9-4d98-8f00-34b3974d1bc4%2Ffiles%2F9c2ed1ba-3237-4213-873c-ea9727aedaf7/info.json 404 Not Found
That %2Ffiles%2F
looks like a possible URI-encoding failure (a %2F
s would otherwise be a /
). The response for the request as sent is:
{
error: "no info"
}
But changing those to slashes only gets a slightly more detailed failure:
{
status: 404,
error: "Not Found"
}
So it isn't that basic. Unsure whether the almond error is relevant.
Our images
route is mapped to Riiif::Engine
. From rails routes
:
riiif /images Riiif::Engine
...
image GET /:id/:region/:size/:rotation/:quality.:format riiif/images#show {:format=>"jpg", :rotation=>/[\w.]+/, :region=>"full", :quality=>"default", :model=>"riiif/image", :size=>/(!|pct:)?[\w.,]+/}
info GET /:id/info.json(.:format) riiif/images#info {:format=>"json", :model=>"riiif/image"}
base GET /:id(.:format) riiif/images#redirect
As sent, this is an info
request with an id
of:
96c4f61a-c4c9-4d98-8f00-34b3974d1bc4%2Ffiles%2F9c2ed1ba-3237-4213-873c-ea9727aedaf7
So somebody with UV knowledge needs to determine: does that make sense?
The thumnail image that does display is retrieved via:
/images/96c4f61a-c4c9-4d98-8f00-34b3974d1bc4%2Ffiles%2F9c2ed1ba-3237-4213-873c-ea9727aedaf7/full/!150,300/0/default.jpg
I.E., it appears to use the same ID, also passed to the Riiif::Engine
, so my assumption is that the ID does make sense.
Riiif controller servicing the request is here: https://github.com/curationexperts/riiif/blob/v1.4.4/app/controllers/riiif/images_controller.rb#L33-L45
The (summarized) logic that is failing is:
model.new(params[:id]).info.valid?
And .valid?
is just:
# Image information is only valid if height and width are present.
# If an image info service doesn't have the value yet (not characterized perhaps?)
# then we wouldn't want to cache this value.
def valid?
width.present? && height.present?
end
Obviously, the comment there suggests a problem: characterization. Recommend somebody with access to that system run a characterization job on the file/fileset and see if that fixes it.
A bit of testing that seems to confirm the conclusion @atz came to: Uploading the same file again, in a new tenant with a simple name and title yielded the same "Not Found" error: https://testtenant.pilot.hykudirect.org/concern/images/20a3bb97-eb0e-449d-9ac8-3a2e28b703f7. Doing the same thing again with a different image file worked fine: https://testtenant.pilot.hykudirect.org/concern/images/454a745c-113d-4db3-8d62-e268f8bf7af9
I'd be happy to run a characterization job on the image file. Pointers for how to do that would be helpful.
Hey, @bbranan. Here's how to run the characterization job in the Rails console:
file_set = FileSet.find('d2360902-2ad8-4d10-a8a5-e947ec953577')
# Attempted to init base path `66572f04-a2ad-4b64-852b-08df2dfcdb61`, but it already exists
# => #<FileSet id: "d2360902-2ad8-4d10-a8a5-e947ec953577", head: [], tail: [], depositor: "mjgiarlo@stanford.edu", title: ["2012-01-21_17-19-39_815.jpg"], date_uploaded: "2017-08-04 16:07:06", date_modified: "2017-08-04 16:07:06", label: "2012-01-21_17-19-39_815.jpg", relative_path: nil, import_url: nil, resource_type: [], creator: ["mjgiarlo@stanford.edu"], contributor: [], description: [], keyword: [], license: [], rights_statement: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: "de0bf3ad-38ed-4a21-a0b4-4f489ce4ecf7", embargo_id: nil, lease_id: nil>
io = JobIoWrapper.find_by(file_set_id: file_set.id)
# => #<JobIoWrapper id: 10, user_id: 1, uploaded_file_id: 82, file_set_id: "d2360902-2ad8-4d10-a8a5-e947ec953577", mime_type: nil, original_name: nil, path: "uploads/66572f04-a2ad-4b64-852b-08df2dfcdb61hyrax/...", relation: "original_file", created_at: "2017-08-04 16:07:07", updated_at: "2017-08-04 16:07:07">
file_id = file_set.original_file.id
# => "d2360902-2ad8-4d10-a8a5-e947ec953577/files/d77a38ec-2a23-4ed9-b84c-ce3bd868231e"
CharacterizeJob.perform_later(file_set, file_id, io.uploaded_file.uploader.path)
# [ActiveJob] Enqueued CharacterizeJob (Job ID: 22840e35-dc80-46d2-b968-42fffab12505) to BetterActiveElasticJob(default) with arguments: #<GlobalID:0x0055caaaf1cf00 @uri=#<URI::GID gid://hyku/FileSet/d2360902-2ad8-4d10-a8a5-e947ec953577>>, "d2360902-2ad8-4d10-a8a5-e947ec953577/files/d77a38ec-2a23-4ed9-b84c-ce3bd868231e", "uploads/66572f04-a2ad-4b64-852b-08df2dfcdb61hyrax/uploaded_file/file/82/2012-01-21_17-19-39_815.jpg"
# => #<CharacterizeJob:0x0055caaaf28238 @arguments=[#<FileSet id: "d2360902-2ad8-4d10-a8a5-e947ec953577", head: [], tail: [], depositor: "mjgiarlo@stanford.edu", title: ["2012-01-21_17-19-39_815.jpg"], date_uploaded: "2017-08-04 16:07:06", date_modified: "2017-08-04 16:07:06", label: "2012-01-21_17-19-39_815.jpg", relative_path: nil, import_url: nil, resource_type: [], creator: ["mjgiarlo@stanford.edu"], contributor: [], description: [], keyword: [], license: [], rights_statement: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: "de0bf3ad-38ed-4a21-a0b4-4f489ce4ecf7", embargo_id: nil, lease_id: nil>, "d2360902-2ad8-4d10-a8a5-e947ec953577/files/d77a38ec-2a23-4ed9-b84c-ce3bd868231e", "uploads/66572f04-a2ad-4b64-852b-08df2dfcdb61hyrax/uploaded_file/file/82/2012-01-21_17-19-39_815.jpg"], @job_id="22840e35-dc80-46d2-b968-42fffab12505", @queue_name="default", @priority=nil, @executions=0>
@atz Can you check my math above? Specifically: we should be able to rely on io.uploaded_file.uploader.path
working in our AWS deployment, right? FWIW, I tested this on a random FileSet in one of my test tenants and io.uploaded_file.uploader.path
and io.path
returned the same path ("uploads/66572f04-a2ad-4b64-852b-08df2dfcdb61hyrax/uploaded_file/file/82/2012-01-21_17-19-39_815.jpg"
), so maybe in this deployment these can be used interchangeably?
(If you find you need to run this regularly, @bbranan, we may want to create a new ticket to make this into a rake task. File that away for later, I guess?)
Given that the same file produced the same resulting error in two different tenants (and I tried more than once in my test tenant), it doesn't appear that re-running characterization will do the trick. @mjgiarlo ran FITS locally on the file, resulting in:
<identification status="CONFLICT">
<identity format="JPEG EXIF" mimetype="image/jpeg" toolname="FITS" toolversion="1.1.1">
<tool toolname="Droid" toolversion="6.1.5" />
<tool toolname="Exiftool" toolversion="10.00" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" />
<version toolname="Droid" toolversion="6.1.5">2.2.1</version>
<externalIdentifier toolname="Droid" toolversion="6.1.5" type="puid">fmt/645</externalIdentifier>
</identity>
<identity format="JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=14, height=11200, bps=0, compression=none, PhotometricI
ntepretation=RGB, orientation=upper-left, width=10400], baseline, precision 8, 2000x2154, frames 3" mimetype="image/jpeg" toolname="FITS" toolversion=
"1.1.1">
<tool toolname="file utility" toolversion="5.25" />
</identity>
</identification>
Note that both width and height are included here, but also status="CONFLICT"
and two differing identity tags.
I reported this in the fits repo and I believe it's fixed in a more recent (possibly master) version
I used FITS 1.1.1 (released on May 30th of this year) to generate the output above, FWIW.
@mjgiarlo: you can only rely on uploader.path
there when the file happens to have been on the same system in carrierwave cache (i.e. not on a worker system). Otherwise you get nil
for an unfetched remote object, as we experienced. If you want to guarantee access to uncached remote content, you need to ask uploader.sanitized_file
which pulls the content into a StringIO
.
> Hyrax::UploadedFile.first.uploader.sanitized_file
=> #<CarrierWave::SanitizedFile:0x007f8669a83238 @file=#<StringIO:0x007f8669a8aa38>, @original_filename="devo_freedom_of_choice_album_p.jpg", @content_type="image/jpeg">
You can see where the uploader's implementation is super-halfass about path
, not even confident enough to fully delegate it to the file
object.
TL;DR: uploader.path
is allowed to fail, mainly because the (configurable) file
object class might not implement it. Good news: the 3rd argument to CharacterizeJob.perform_later
is optional, so if the file isn't local (anymore) you can just leave it off. The job will pull down the file itself.
Image has bad metadata.
The part where Photoshop CS5 says the image is 72x72 should be removed and the image re-uploaded. I suspect this piece of metadata comes from a PSD file and not the JPG, specifically the PSD's thumbnail.
We can talk about how the system could give better feedback here, but this is basically a WONTFIX. We are not implementing heuristics for effectively resolving conflicting image metadata while preserving it as "everybody else downstream's problem". And we aren't going to use heuristics to subtly alter uploaded files as they are preserved. If there's no further development direction added, I will close this ticket in a couple days.
I agree that the uploaded image file should not be changed as it is stored. However, the system should be able to recognize a failure in derivative generation and not attempt to display the universal viewer for that image. Of course, also providing feedback to the user to let them know about the failure (and what could be done to resolve it) would be ideal.
There are a variety of reasons why it may not be practical or desirable to edit an original file, even to fix incorrect metadata. It should still be possible to store those files in the repository without them appearing to be broken in the UI. In this case, thumbnail generation seems to have worked fine, which is still valuable.
They are broken in the UI because they are broken in fact. UV requires metadata we cannot accurately provide because the user did not provide it. Might as well complain that "my picture of a giraffe looks like fish."
I can imagine being more explicit about why failure happens, but I can't really imagine papering over the brokenness to make it invisible or semi-functional:
The independent asynchronous nature of CharacterizeJob
(and all the jobs) means that it is not architecturally plain to even just reject malformed files (the file is already accepted before the job runs).
There is zero visibility of job failures (or jobs at all) to the end user except through really basic notifications.
Async job interdependencies are not modeled, so they are not knowable to the system as a whole or presentable (e.g. workflow diagram).
@atz if I'm hearing you correctly, you seem to be suggesting that in this scenario we should either (1) reject the image file outright or (2) store the file and allow the UV viewer to display an error.
I'm suggesting that neither of these approaches are preferable. While the image in question may have internal metadata which is not entirely consistent, the primary purpose of the file is to display an image, not to be a metadata structure. Every image viewing application I have tried is able to display the image. The user noted that he has stored and used this file successfully on ContentDM, DSpace, Shared Shelf, and Drupal. Does this mean the metadata shouldn't be fixed? No. But from the perspective of most users (right or wrong) if Hyku can't handle this file, then Hyku is broken, not the file.
I would suggest that failing gracefully (by not displaying the UV viewer if we know it will fail to display the file) is a far better approach than defaulting to a "fix your data" response. The person depositing the data is often not in a position to do anything with the data other than deposit it. Telling them there is a problem with a file is great, but doing everything we can to handle it as-is is better, even if that means turning off a feature or two.
While the embedded metadata on these files is odd, I still do not understand why the FITS output looks OK and yet we're not seeing the characterization terms show up on these files as expected. I wonder if we can find someone in the community who's dealt with files like these before? That @jpstroop fellow knows a lot about image file formats... :)
@bbranan My bullet points were problematizing statements, not recommendations, so please don't read them that way. UV requires the metadata be correct. It is incorrect. [UPDATE: the rest of this comment is basically mistaken.]
I agree that some type of detection or fallback might be an improvement, but this isn't a bug in UV and is not a bug in Hyku.
That would be a novel feature (not a bugfix) requiring a different flow. Right now the modeled object (that detects invalidity) is entirely in UV client-side JS. We would need to build a parallel capability on the server-side. For a single image page, maybe that wouldn't seem like a big deal, but for an entire IIIF manifest worth of (potentially thousands of) images, it very much would be a big deal and would undercut many of the advantages of UV.
@mjgiarlo See the screenshots and my commentary above. The metadata does NOT look OK. It is internally in conflict: 2000x2154
vs 72x72
. Since we are primarily concerned with archival preservation of valid data, accepting or papering over points of invalidity is hardly a feature. Making it easier to get malformed data in the archive is counter-productive.
@atz Sorry, what I meant was that the <metadata>
section of the FITS (1.1.1) output looks fine to me:
<metadata>
<image>
<compressionScheme toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Uncompressed</compressionScheme>
<imageWidth toolname="Exiftool" toolversion="10.00">2000</imageWidth>
<imageHeight toolname="Exiftool" toolversion="10.00">2154</imageHeight>
<colorSpace toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">RGB</colorSpace>
<iccProfileName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Adobe RGB (1998)</iccProfileName>
<iccProfileVersion toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">2.1.0</iccProfileVersion>
<YCbCrSubSampling toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">1 1</YCbCrSubSampling>
<orientation toolname="Exiftool" toolversion="10.00">normal*</orientation>
<samplingFrequencyUnit toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="SINGLE_RESULT">in.</samplingFrequencyUnit>
<xSamplingFrequency toolname="Exiftool" toolversion="10.00">72</xSamplingFrequency>
<ySamplingFrequency toolname="Exiftool" toolversion="10.00">72</ySamplingFrequency>
<bitsPerSample toolname="Exiftool" toolversion="10.00">8 8 8</bitsPerSample>
<samplesPerPixel toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">3</samplesPerPixel>
<scanningSoftwareName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Adobe Photoshop CS5 Windows</scanningSoftwareName>
<exifVersion toolname="Exiftool" toolversion="10.00">0221</exifVersion>
<lightSource toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="SINGLE_RESULT">unknown</lightSource>
</image>
</metadata>
Ah, I see. Yeah, you are right. xSamplingFrequency
(DPI) is not the same as X Pixel Dimension
. The dimensions are not in conflict, so my premise was incorrect.
The XML components of the <identification status="CONFLICT">
are:
<identity format="JPEG EXIF" mimetype="image/jpeg" toolname="FITS" toolversion="1.1.1">
<tool toolname="Droid" toolversion="6.1.5" />
<tool toolname="Exiftool" toolversion="10.00" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" />
<version toolname="Droid" toolversion="6.1.5">2.2.1</version>
<externalIdentifier toolname="Droid" toolversion="6.1.5" type="puid">fmt/645</externalIdentifier>
</identity>
vs:
<identity format="JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=14, height=11200, bps=0, compression=none, PhotometricI
ntepretation=RGB, orientation=upper-left, width=10400], baseline, precision 8, 2000x2154, frames 3" mimetype="image/jpeg" toolname="FITS" toolversion=
"1.1.1">
<tool toolname="file utility" toolversion="5.25" />
</identity>
I don't know why the <tool toolname="file utility" toolversion="5.25" />
element couldn't be included in the larger block, but the format
is the point of divergence. I don't know FITS well enough to know the significance.
FWIW, even some of the fixture objects in hydra-works, where the characterization service lives, have this same (or similar) conflict in them. See a sampling of the fits_*.xml
docs here: https://github.com/samvera/hydra-works/tree/master/spec/fixtures
So the file utility
output is entirely based on the local (unix) file
executable and the "magic files" defined at the system level.
Hyrax has really limited coverage for how broad the matrix of interactions underneath a FITS dependency might be:
0.6.2
, the rest are 0.6.0
. file utility
metadata, and they are all version 5.04
Locally, I have:
0.8
(idk why only 2 levels of versioning!) file-5.04
/usr/share/file/magic
On the same file, I get:
<identification>
<identity format="Exchangeable Image File Format" mimetype="image/jpeg" toolname="FITS" toolversion="0.8">
<tool toolname="file utility" toolversion="5.04" />
<tool toolname="Exiftool" toolversion="9.13" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" />
</identity>
</identification>
No conflicts in that part. I do get a conflict in in <fileinfo>
though, presumably unrelated:
<created toolname="Exiftool" toolversion="9.13" status="CONFLICT">2011:12:13 12:46:36-05:00</created>
<created toolname="NLNZ Metadata Extractor" toolversion="3.4GA" status="CONFLICT">2012:04:12 11:27:33</created>
@mjgiarlo Note though, that my Exchangeable Image File Format
is another different format
that doesn't match either of the two already in conflict.
On my local Hyrax, the same image appears to be valid:
> fs = RareBooks::Atlas.find('0k225b04p').file_sets.first
=> #<FileSet id: "q237hr920", head: [], tail: [], depositor: "archivist1@example.com", title: ["xlm_microscopy_schreiber_02-summer-flounder_sc.jpg"], date_uploaded: "2017-08-15 22:51:16", date_modified: "2017-08-15 22:51:16", label: "xlm_microscopy_schreiber_02-summer-flounder_sc.jpg", relative_path: nil, import_url: nil, resource_type: [], creator: ["archivist1@example.com"], contributor: [], description: [], keyword: [], license: [], rights_statement: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: "39835492-ec7f-4269-a1b8-e329ae6a8d6e", embargo_id: nil, lease_id: nil>
> fs.height
=> ["2154"]
> fs.width
=> ["2000"]
> fs.valid?
=> true
In my local fedora itself:
From St. Lawrence pilot: We are experiencing an issue whereby newly uploaded images are listing as “Not Found” via the Universal Viewer. This is for images added from accounts with the admin role and also via a basic user account.
Steps to Reproduce:
Screenshots attached.
Example record
:
Additional Details: Images were uploaded via a MacPro/MacOS Sierra version 10.12.6, Chrome 60.x and Firefox 52.x