sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Handle error fetching image from Met #80

Closed justinlittman closed 5 years ago

justinlittman commented 5 years ago
docker run --rm -e SKIP_FETCH_CONFIG=true                 -e SKIP_FETCH_DATA=true                 -v $(pwd)/.:/opt/traject                 -v $(pwd)/../dlme-traject:/opt/traject/config                 -v $(pwd)/../dlme-metadata:/opt/traject/data                 -v $(pwd)/output:/opt/traject/output                 suldlss/dlme-transform:latest met

results in:

2019-02-27T15:57:10+00:00 ERROR Unexpected error on record <record #5 (data/met/ancient-near-east-art/data/met_museum_records.csv #5), output_id:met_321385>
    while executing (to_field "agg_preview" at config/met_csv_config.rb:40)

    Record: 74.51.4370,False,True,321385,Ancient Near Eastern Art,Stamp seal,Conoid seal,"",Late Cypriot III-Cypro-Geometric III,"","","","","","","","","","","","",ca. 12th–9th century B.C.,-1200,-800,"Steatite, brown","Seal Face; 1.61 x 1.46 cm
Height: 1.35 cm
String Hole: 0.25 cm","The Cesnola Collection, Purchased by subscription, 1874–76","","","","","",Cyprus (?),"","","","","",Stone-Stamp Seals,"",http://www.metmuseum.org/art/collection/search/321385,7/31/2017 8:00:01 AM,"Metropolitan Museum of Art, New York, NY"

    Exception: RuntimeError: Unexpected response type 'text/html' for http://www.metmuseum.org/api/Collection/additionalImages?crdId=321385
    /opt/traject/lib/dlme_utils.rb:24:in `fetch_json'

[ERROR] Unexpected response type 'text/html' for http://www.metmuseum.org/api/Collection/additionalImages?crdId=321385
2019-02-27T15:57:10+00:00 ERROR Unexpected error on record <record #1 (data/met/ancient-near-east-art/data/met_museum_records.csv #1), output_id:met_321381>
    while executing (to_field "agg_preview" at config/met_csv_config.rb:40)

    Record: 74.51.4366,False,True,321381,Ancient Near Eastern Art,Stamp seal,Quasi-pyramidal seal,Cypriot,Late Cypriot III,"","","","","","","","","","","","",ca. 12th–11th century B.C.,-1200,-1000,"Steatite, gray brown","Seal Face: 1.19 x 1.68 cm
Height: 1.78 cm
String Hole: 0.4 cm","The Cesnola Collection, Purchased by subscription, 1874–76","","","","","",Cyprus (?),"","","","","",Stone-Stamp Seals,"",http://www.metmuseum.org/art/collection/search/321381,7/31/2017 8:00:01 AM,"Metropolitan Museum of Art, New York, NY"

    Exception: RuntimeError: Unexpected response type 'text/html' for http://www.metmuseum.org/api/Collection/additionalImages?crdId=321381
    /opt/traject/lib/dlme_utils.rb:24:in `fetch_json'

D, [2019-02-27T15:57:10.144420 #7] DEBUG -- : DLME::Utils.fetch_json(http://www.metmuseum.org/api/Collection/additionalImages?crdId=321385) (433.3ms)
D, [2019-02-27T15:57:10.146086 #7] DEBUG -- : DLME::Utils.fetch_json(http://www.metmuseum.org/api/Collection/additionalImages?crdId=321381) (503.6ms)

where http://www.metmuseum.org/api/Collection/additionalImages?crdId=321385 resolves to:

{
message: "An error has occurred."
}
justinlittman commented 5 years ago

How should we handle this?

jacobthill commented 5 years ago

This is an error on their end, correct? If so, I think we can anticipate many of these in the future. Many collections/records will go off line. Would it make sense to pass a message to the client like "Image not found. The url may have changed or may no longer be available." or something like that? @anarchivist I think you should weigh in on this as well.

jcoyne commented 5 years ago

It looks to me like they put up a capcha on their API to prevent robots (read "us") from using the API.

anarchivist commented 5 years ago

So, a couple things: