Closed gwiedeman closed 3 years ago
thanks @gwiedeman!
it seems like there are a couple of issues at play here. as i dig into this properly, here's some preliminary information:
Hyrax 2.9.0 includes a complete rewrite of the IIIF Manifest generation process. this rewrite is a substantial performance improvement on manifest generation, especially for Works with many FileSets. all the code that powered the old behavior is still in the release, and it can be reinstated by adding the following to controllers including Hyrax::WorksControllerBehavior
:
def iiif_manifest_presenter
presenter
end
one known change in the generation process is that we no longer use the Hyrax request context to determine the protocol to be used by the IIIF server. in practice what it means is that, for apps using Riiif
and HTTPS, the protocol for Riiif
URLs (as generated by Riiif::Engine.routes.url_helpers.image_url
via Hyrax.config.iiif_image_url_builder
) is dependent on Riiif::Engine
configuration, not the protocol you're using to get the manifest. in theory, this ought to be an improvement, but it seems to be causing issues for some app deployments (notably https://nurax-dev.curationexperts.com). see, for example:
http://nurax-dev.curationexperts.com/images/p5547r38s%2Ffiles%2F358aab77-5df3-4224-82e3-47e01b8add99%2Ffcr:versions%2Fversion1/full/600,/0/default.jpg vs https://nurax-dev.curationexperts.com/images/p5547r38s%2Ffiles%2F358aab77-5df3-4224-82e3-47e01b8add99%2Ffcr:versions%2Fversion1/full/600,/0/default.jpg
if your Riiif
is sitting behind a server enforcing SSL, putting this in your initializer may fix your problem Riiif::Engine.routes.default_url_options = { protocol: 'https' }
in retrospect, this change may have been a mistake (or maybe is too aggressive a behavioral change for a minor release). i'm considering ways to roll it back without losing the performance improvements referenced above. feedback would be valued.
@gwiedeman your issue looks (at my best first guess) like it is due to neither of the above issues.
the file id for a given file_set shouldn't be calling #first
as in file_set.files.first.id
, instead, i think we should be looking for mismatches between the index and Fedora. for a FileSet that has this problem, can you check FileSet.find(id).original_file
vs the indexed original_file_id
in your solr instance?
i think in addition to the errant/extra text file (no idea where this might have come from, can you point me to your code base?), these objects must have a weird index state. the new manifest generation relies much more heavily on the index to understand which files to render.
that's the best i've got.
after #4663, nurax-dev.curationexperts.com has working IIIF via UV again. i'm not closing this out at this point, since @gwiedeman's issue is still undiagnosed, but i think it's no longer a blocker for 3.0.0.
Thanks for looking at this @no-reply !
Our codebase is here if that helps: https://github.com/UAlbanyArchives/hyrax-UAlbany
Its very possible this is an indexing issue. We also just updated from Solr 7 to 8.6 recently before updating to 2.9.0. fs.original_file
returns the correct ID, and this is the same as the original_file_id
shown in the console after fs.to_solr
. However, when I query the FileSet in the Solr console I do not see an original_file_id
key in the document as I would expect. This is the response:
"response":{"numFound":1,"start":0,"maxScore":16.77106,"numFoundExact":true,"docs":[
{
"system_create_dtsi":"2019-02-02T00:28:42Z",
"system_modified_dtsi":"2019-02-02T00:28:54Z",
"has_model_ssim":["FileSet"],
"id":"2f75rg51j",
"accessControl_ssim":["Not sure if this needs to be omitted"],
"depositor_ssim":["importer@albany.edu"],
"depositor_tesim":["importer@albany.edu"],
"title_tesim":["nam_ua_390_pm_026037.jpg"],
"date_uploaded_dtsi":"2019-02-02T00:28:42Z",
"date_modified_dtsi":"2019-02-02T00:28:42Z",
"creator_tesim":["importer@albany.edu"],
"thumbnail_path_ss":"/downloads/2f75rg51j?file=thumbnail",
"hasRelatedMediaFragment_ssim":["2f75rg51j"],
"hasRelatedImage_ssim":["2f75rg51j"],
"label_tesim":["nam_ua_390_pm_026037.jpg"],
"label_ssi":"nam_ua_390_pm_026037.jpg",
"file_format_tesim":["jpeg (JPEG File Interchange Format)"],
"file_size_lts":19517703,
"height_is":4840,
"width_is":6586,
"visibility_ssi":"open",
"mime_type_ssi":"image/jpeg",
"digest_ssim":["urn:sha1:bfe876228776081f1bf7c9c3d0b3c481e71863c5"],
"original_checksum_tesim":["04400b96f8c5d5fc7c00e4688a7e226d"],
"read_access_group_ssim":["public"],
"edit_access_group_ssim":["content-admin"],
"human_readable_type_tesim":["File"],
"_version_":1624314830176387072,
"timestamp":"2019-02-02T00:28:55.297Z",
"score":16.77106}]
},
It is weird that fs.files
returns two Hydra::PCDM::File objects when there is only one file in the fileset? The other being the empty text file. Is possible that this is bad local data.
I don't suspect its an https issue, as I get the same behavior on our dev server without using ssl, but I'll look into this more.
Also, as for how aggressive to make changes for a minor release, I think its okay. It think its expected that you should test even minor releases and that's how we found the issue. We're really excited about the many FileSets performance upgrades and the barriers should be as low as possible for these types of improvements.
thanks for the update @gwiedeman.
i agree that it's weird you're ending up with multiple files, and i'd like to dig into that more deeply with you.
focusing briefly on the index issue, i wonder if you could try fs.update_index
fixes the issue for you? if to_solr
has the original file, then i'd really expect it to be in the solr document, and it being missing feels like it's certainly at least part of the problem here.
on the other hand, we should be falling through to a Fedora lookup on #original_file
in the case that the indexed original_file_id
isn't present. do you have entries in your log like "original_file_id for [some id] not found, falling back to Fedora."
?
Ah, I forgot that .to_solr
doesn't actually update the index. fs.update_index
does add a original_file_id_ssi
key in the index:
"original_file_id_ssi":"2f75rg51j/files/831e33b8-88c8-4922-af03-3cfa96c1f2af/fcr:versions/version1",
The ID in there is correct, and appending it to the Fedora URL does return a file from Fedora, but Universal Viewer still shows the same error in 2.9.0.
As for the rails logs, I do get that warning when loading an object that I didn't .update_index
for, but the warning stops appearing after fs.update_index
which seems to be as expected:
W, [2020-12-16T15:41:04.347069 #8597] WARN -- : [37ca1658-3b7e-4bdd-b61c-d959d5cad771] original_file_id for nc580t73h not found, falling back to Fedora.
Apart from that, I do get this when I load the FileSet page, but that's it:
E, [2020-12-16T15:24:24.995426 #4954] ERROR -- : [3292abc1-487e-49bc-9b17-157ed1ca372e] unable to fetch event: FileSet:2f75rg51j:event
Even after turning up the log level
I finally sorted this out when I noticed I was getting a CORS error on our dev server because the object on my.dev.server:3000/concern/images/6t053p39d
was requesting the manifest at my.dev.server/images/2f75rg51j%2Ffiles%2F831e33b8-88c8-4922-af03-3cfa96c1f2af/info.json
(without the port number).
This was odd as I was getting the same result on the production server, but that issue was due to SSL enforcement, so @no-reply 's suggestion of adding Riiif::Engine.routes.default_url_options = { protocol: 'https' }
to config/initalizers/riiif.rb
resolved it.
I'll leave this open, as I'm not sure if its worth trying to get the manifest to use rails_root or whatever to use the port number. I would probably recommend adding the riiif initalizer option to the 2.9.0 release notes, but I still think its totally fair for a minor release.
Thanks @no-reply for spending some time on this!
thanks for the follow-up @gwiedeman.
based on this update, i think the patch in #4663 ought to fix this without need for the default_url_options
override. i'll propose a 2.9.2
release later today which should prevent other folks from running into issues.
once that's out, we should be able to close this out.
closing this as fixed by the release of 2.9.2
Descriptive summary
Universal Viewer appears to work just fine in Hyrax 2.8.0.
Changing the Hyrax version in the Gemfile to 2.9.0 and running
bundle update hyrax
causes Universal Viewer toIn looking at Troubleshooting RIIIF, it seems like
iiif_info_url_builder
is returning the wrong file ID and Fedora is returning an empty text file instead of the file object.iiif_image_url_builder
then returns an incorrect URL for the file:This URL returns a 500.
From navigating the working University Viewer in 2.8.0, it looks like the correct file ID is 831e33b8-88c8-4922-af03-3cfa96c1f2af, not d1420cd8-a605-4ecd-8197-9e0f0862df5f. Changing that ID in the URL returned by
iiif_info_url_builder
also returns a manifest, and changing it iniiif_image_url_builder
correctly returns the file.So it seems that the 2.9.0 upgrade is correctly using
@id
from the manifest when the wrong@id
in 2.8.0 did not matter.It looks like
iiif_info_url_builder
is relying onfs.files.first.id
to return the ID. This returns a Fedora URI with the wrong ID, which serves an empty .txt file.fs.files.last.id
Returns the correct ID and Fedora serves the file. So it seems like all of our FileSets have a bad first file and I'm not sure if this is something that is supposed to happen or if this is something that happened in a previous Hyrax version, or if this is totally a local issue and we should just delete all these.Rationale
2.8.0 to 2.9.0 should be a non-breaking update from what I can tell in the release notes.
Expected behavior
Universal Viewer displays the image.
Actual behavior
Steps to reproduce the behavior
bundle update hyrax
Related work
Troubleshooting RIIIF was helpful.