usnationalarchives / Catalog-API

National Archives Catalog API
https://catalog.archives.gov/api/v2/api-docs/
128 stars 31 forks source link

Some records images and thumbnails seem to be be broken #9

Open ddcornwall opened 6 years ago

ddcornwall commented 6 years ago

For about the last four days, I've been noticing that some images and thumbnails in records retrieved through the API do not display even though they are plainly in the catalog. Here's one example:

NaID = 72039156 - Newspapers Record retrieved through API - https://catalog.archives.gov/api/v1/?naIds=72039156 (broken images) Record in catalog - https://catalog.archives.gov/id/72039156 (images display correctly)

This does not happen consistently, so perhaps it is a storage problem rather than an API problem?

DominicBM commented 6 years ago

We're currently dealing with some issues while migrating media servers, and I apologize for that. Some images are in the correct location, and showing up right in the UI, but the API hasn't updated its logic for how it constructs the "@url" field by adding the server URL to the file path.

As a workaround if you encounter this issue, you can find the correct URL by prepending "https://catalog.archives.gov/catalogmedia" to the "@path" field in the data, rather than using "@url". I know this isn't perfect, because I don't know if we can tell if an image URL is bad programmatically, rather than testing each one. Hopefully that's some help, though.

I'll update this issue when we have more updates on the problem.

ddcornwall commented 6 years ago

When I tried the prepending you mentioned on the above record, it worked for the main image, but not for the thumbnail, which is what I really needed.

Prepend + path = https://catalog.archives.gov/catalogmedia/opa-renditions/thumbnails/1870939-005-008-0001.jpg-thumb.jpg and I got:

` NoSuchKey

The specified key does not exist. opa-renditions/thumbnails/1870939-005-008-0001.jpg-thumb.jpg 00833F2F19D06D9F tklVM3x8w4Ac77znkIzgdVavt7BWC4/odDb+6Ka/TW/4XNSb+mQUR5mAQ0uDo8fzRSIaZ8rp70w=

`

Having the link to the full file is a bit of help though. As always I do appreciate you taking time away from other duties to offer thought out answers to my questions.

DominicBM commented 6 years ago

Okay, I did some research, and I have the workaround formula for getting to the thumbnail media server URL.

Instead of "prepend + (thumbnail) path", it's actually more involved. What you will have to do is (1) "prepend + (file) path + (thumbnail path)", and then also (2) substitute the "/lz/" from the file path with "/live/". So for the example you used, you should be able to get to this working URL:

This is obviously not ideal and we are hoping to have this resolved.

ddcornwall commented 6 years ago

Thank you! This workaround does work for me. Here's how I displayed the first thumbnail from a File Unit Records with multiple objects using JavaScript:

`filePath=response.opaResponse.results.result[i].objects.object[0].file["@path"].slice(4);

$("#recent").append("<img class=\"img-thumbnail\" src = \"https://catalog.archives.gov/catalogmedia/live/" + filePath + "/" + response.opaResponse.results.result[i].objects.object[0].thumbnail["@path"] + "\">"); `

Thanks again for your responsiveness. It's been very helpful to my programming work. You can close this issue, though I would like to know when you get the underlying issues resolved.

ddcornwall commented 6 years ago

It turns out that the workaround method is not successful with pdfs. See https://catalog.archives.gov/id/40862556 for an example. It seems to be this is because pdfs are being stored in slightly different place as show by the file path (results.result[0].objects.object[0].file["@path"]) for this record:

content/seattle/rg-435/605073/Box_14/605073-014-008/605073-014-008.pdf

Is there some specific criteria what gets stored in /catalogmedia/live vs /content? Thanks

ddcornwall commented 6 years ago

I found a solve for the pdf thumbnail issue. I test ["@mime"] before deciding what sort of prepending to do to it.

if (response.opaResponse.results.result[i].objects.object[0].file["@mime"] == "image/jpeg") { filePath=response.opaResponse.results.result[i].objects.object[0].file["@path"].slice(4); $("#recent").append("<img class=\"img-thumbnail\" src = \"https://catalog.archives.gov/catalogmedia/live/" + filePath + "/" + response.opaResponse.results.result[i].objects.object[0].thumbnail["@path"] + "\">"); } else { $("#recent").append("<img class=\"img-thumbnail\" src = \"" + response.opaResponse.results.result[i].objects.object[0].thumbnail["@url"] + "\">");

ddcornwall commented 6 years ago

I'm checking in to see where you are in still needing the workarounds above. I'm starting to find records where the workaround broke the thumbnails like this record for Robert Stroud, the Birdman of Alcatraz. https://catalog.archives.gov/api/v1/?naIds=24731415

DominicBM commented 6 years ago

@ddcornwall: Sorry for the delay! I don't yet have much news for you on this front. Our developers are working on rolling out a fix for the issue I mentioned elsewhere in which some fields are inaccessible via the API as their top priority. We view these object-related bugs (this and #8) as the highest priority after that, but don't yet have a timeline. We'll definitely update this issue with any news. For now, it may be necessary to implement some code that checks the HTTP status code of the file URL, and only tries the workaround if it's not 200, if that's possible? Sorry again for the inconvenience!

ddcornwall commented 6 years ago

@DominicBM: No worries! I understand we all have duties to juggle and people to hear back from. Thanks for the status update and thanks for the idea of checking the HTTP status code. That simply didn't occur to me. I'll look up how that's done and apply it to my code. Thanks again!

ddcornwall commented 6 years ago

@DominicBM - I couldn't get http status code to work because of Cross-Domain issues, but I did find a different way to fix things. My workaround failed in part because the parent file path did not always begin the same way. Here are two examples:

https://catalog.archives.gov/api/v1/?naIds=52204721 (Title: 1306-07 Barrow - 1940 Census - thumbnail works without workaround) File.["@path"]: "content/seattle/rg-075/2655229/2655229-003-021-0001.jpg",

https://catalog.archives.gov/api/v1/?naIds=72034258 (Title: Chifornak) File.["@path"]: "/lz/seattle/rg-075/1870939/JPGs/1870939-001-008-0001.jpg"

So far, it seems like records that start with "content/" don't need the workaround. So I'm testing to see if file["@path"] begins with "/lz" and applying the fix then.

I have a different issue with non-image,non-pdf files now, but that one is probably on me.

ddcornwall commented 6 years ago

Hi @DominicBM - Wanted to let you know I think I've got all of my thumbnail display issues resolved. Check out the code for "displayThumbnail" within https://github.com/ddcornwall/nara-alaskana/blob/master/js/browse.js if you want to see how I resolved things. Not saying that it's the best code in the world, but seems to be doing the job.