internetarchive / iiif

The official Internet Archive IIIF service
GNU General Public License v3.0
21 stars 4 forks source link

500 Internal Server Error when trying to retreive some items #11

Closed chchch closed 10 months ago

chchch commented 11 months ago

Hello,

Since the unofficial iiif.archivelab.org API seems to have been shut down, I'm trying to migrate to iiif.archive.org. But some items don't seem to work, and I can't figure out why. For example:

https://iiif.archive.org/iiif/0-0-1/manifest.json is fine, but

https://iiif.archive.org/iiif/0-._20211206/manifest.json doesn't work.

Thanks!

hadro commented 11 months ago

Hello, sorry for the issues, the archivelab instance should continue working for at least a grace period as we get the new version set up, so we're looking into it. Meanwhile, the second item you mention seems to be causing some URL encoding issues for the image server request, we'll figure out what's happening and get that working on the new version also. Thanks for your patience!

chchch commented 11 months ago

iiif.archivelab.org seems to have been taken down now (I guess the grace period is over?)

Yes I tried it with a couple of different IDs but I couldn't figure out which characters were causing the problem.

Thanks so much for making this service available!!

hadro commented 11 months ago

The archivelab service has been restored, sorry about the outage. And we'll still keep looking into the manifest issue, thanks for giving us the example

hadro commented 11 months ago

So for the couple of places where we use .replace('/','%2f'), I suggest we use the more robust URL Quoting from urllib.parse instead, that seems to fix this issue.

Specifically, a fix for this is to change lines 384-385 as follows:

Existing:

                    imgId = f"{zipFile}/{fileName}".replace('/','%2f')
                    imgURL = f"{image_server}/3/{imgId}"

New version:

                    imgId = f"{zipFile}/{fileName}"
                    imgURL = f"{image_server}/3/{quote(imgId, safe='()')}"

Also requires adding quote in the imports on line 8

We should do the same for the similar functionality on lines 269 and 270 -- CCing @digitaldogsbody in case you are incorporating other changes in the near future, otherwise I can do a PR on this soon once Rob's changes are backfilled

digitaldogsbody commented 11 months ago

Thanks Josh - I'd gone with the simple replace approach because I wasn't sure about the Cantaloupe handling of other characters that might appear in an identifier if they got URL-encoded (although I think quote shouldn't change anything in the problem identifier above).

Maybe we can try and make a list of "awkward" identifiers to add to a test?

hadro commented 11 months ago

Good call -- I'll find some more, but here's (seemingly) another example: https://iiif.archive.org/iiif/mareful/manifest.json

hadro commented 11 months ago

Another few:

hadro commented 11 months ago

I should say, the identifiers for those themselves are not "awkward", but I believe they all point to image files that have non-latin characters in the filenames

glenrobson commented 11 months ago

Note this affects filenames not identifiers. Mike to add it to the pull request. Josh to add a unit test.

digitaldogsbody commented 11 months ago

The quoting fix has resolved all of these except mareful, which is also affected by #12 and another issue that I'll open a ticket for (Cantaloupe can't seem to find files in subdirectories)