pod4lib / aggregator

POD Aggregator, f.k.a. the POD Data Lake
https://pod.stanford.edu
Apache License 2.0
9 stars 3 forks source link

UncaughtThrowError in proxy#show when fetching normalized MARCXML or MARC21 via ResourceSync #329

Closed anarchivist closed 3 years ago

anarchivist commented 3 years ago

See Honeybadger.

li-dl-7346-0256:resync matienzo$ ./resync-sync -v --capability-list https://pod.stanford.edu/.well-known/resourcesync/normalized-capabilitylist/marc21 --access-token "bogustoken" -b --ignore-failures  https://pod.stanford.edu/ /tmp/pod
Reading capability list https://pod.stanford.edu/.well-known/resourcesync/normalized-capabilitylist/marc21
Reading resource list https://pod.stanford.edu/organizations/normalized_resourcelist/marc21
Read sitemap/sitemapindex from https://pod.stanford.edu/organizations/normalized_resourcelist/marc21
Parsed as sitemapindex, 14 sitemaps
Now reading 14 sitemaps
Reading sitemap from https://pod.stanford.edu/organizations/brown/streams/2020-11-17b/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/chicago/streams/chicago-2021-02/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/columbia/streams/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/cornell/streams/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/dartmouth/streams/2021-02-09/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/duke/streams/7546b907-2e75-4f8a-b13d-daf67d18b6cf/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/harvard/streams/8e51c9a1-5df9-453f-8128-3befec88ac79/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/jhu/streams/83e87b39-9317-4fae-96c1-d277a2c2e4e0/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/mit/streams/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/penn/streams/2021-03-18/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/princeton/streams/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/stanford/streams/20210221/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/yale/streams/b22453d9-8917-4c57-8a55-d3c554e1b16f/normalized_resourcelist/marc21 (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/yultest/streams/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/normalized_resourcelist/marc21 (0 bytes)
Read source resource list, 24 resources listed
Scanning disk from /tmp/pod
Status:     NOT IN SYNC (same=0, to create=24, to update=0, to delete=0)
Will GET 24 resources
created: https://pod.stanford.edu/file/4669/duke-2020-12-13-full-marc21.mrc.gz -> /tmp/pod/file/4669/duke-2020-12-13-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/4669/duke-2020-12-13-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/4669/duke-2020-12-13-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/4858/yultest-2021-01-10-full-marc21.mrc.gz -> /tmp/pod/file/4858/yultest-2021-01-10-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/4858/yultest-2021-01-10-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/4858/yultest-2021-01-10-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/4868/brown-2021-01-10-full-marc21.mrc.gz -> /tmp/pod/file/4868/brown-2021-01-10-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/4868/brown-2021-01-10-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/4868/brown-2021-01-10-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/5782/dartmouth-2021-02-14-full-marc21.mrc.gz -> /tmp/pod/file/5782/dartmouth-2021-02-14-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/5782/dartmouth-2021-02-14-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/5782/dartmouth-2021-02-14-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/6119/chicago-2021-02-28-full-marc21.mrc.gz -> /tmp/pod/file/6119/chicago-2021-02-28-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/6119/chicago-2021-02-28-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6119/chicago-2021-02-28-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/6362/cornell-2021-03-15-full-marc21.mrc.gz -> /tmp/pod/file/6362/cornell-2021-03-15-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/6362/cornell-2021-03-15-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6362/cornell-2021-03-15-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/6367/stanford-2021-03-15-full-marc21.mrc.gz -> /tmp/pod/file/6367/stanford-2021-03-15-full-marc21.mrc.gz
Failed to GET https://pod.stanford.edu/file/6367/stanford-2021-03-15-full-marc21.mrc.gz -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6367/stanford-2021-03-15-full-marc21.mrc.gz
created: https://pod.stanford.edu/file/6371/marc21 -> /tmp/pod/file/6371/marc21
Failed to GET https://pod.stanford.edu/file/6371/marc21 -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6371/marc21
created: https://pod.stanford.edu/file/6372/deletes -> /tmp/pod/file/6372/deletes
Failed to GET https://pod.stanford.edu/file/6372/deletes -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6372/deletes
created: https://pod.stanford.edu/file/6379/marc21 -> /tmp/pod/file/6379/marc21
Failed to GET https://pod.stanford.edu/file/6379/marc21 -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6379/marc21
created: https://pod.stanford.edu/file/6380/deletes -> /tmp/pod/file/6380/deletes
Failed to GET https://pod.stanford.edu/file/6380/deletes -- 500 Server Error: Internal Server Error for url: https://pod.stanford.edu/file/6380/deletes
created: https://pod.stanford.edu/organizations/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/streams/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/removed_since_previous_stream -> /tmp/pod/organizations/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/streams/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/2020-11-17b/streams/2020-11-17b/removed_since_previous_stream -> /tmp/pod/organizations/2020-11-17b/streams/2020-11-17b/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/2021-02-09/streams/2021-02-09/removed_since_previous_stream -> /tmp/pod/organizations/2021-02-09/streams/2021-02-09/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/20210221/streams/20210221/removed_since_previous_stream -> /tmp/pod/organizations/20210221/streams/20210221/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/streams/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/removed_since_previous_stream -> /tmp/pod/organizations/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/streams/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/streams/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/removed_since_previous_stream -> /tmp/pod/organizations/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/streams/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/7546b907-2e75-4f8a-b13d-daf67d18b6cf/streams/7546b907-2e75-4f8a-b13d-daf67d18b6cf/removed_since_previous_stream -> /tmp/pod/organizations/7546b907-2e75-4f8a-b13d-daf67d18b6cf/streams/7546b907-2e75-4f8a-b13d-daf67d18b6cf/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/83e87b39-9317-4fae-96c1-d277a2c2e4e0/streams/83e87b39-9317-4fae-96c1-d277a2c2e4e0/removed_since_previous_stream -> /tmp/pod/organizations/83e87b39-9317-4fae-96c1-d277a2c2e4e0/streams/83e87b39-9317-4fae-96c1-d277a2c2e4e0/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/8e51c9a1-5df9-453f-8128-3befec88ac79/streams/8e51c9a1-5df9-453f-8128-3befec88ac79/removed_since_previous_stream -> /tmp/pod/organizations/8e51c9a1-5df9-453f-8128-3befec88ac79/streams/8e51c9a1-5df9-453f-8128-3befec88ac79/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/streams/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/removed_since_previous_stream -> /tmp/pod/organizations/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/streams/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/b22453d9-8917-4c57-8a55-d3c554e1b16f/streams/b22453d9-8917-4c57-8a55-d3c554e1b16f/removed_since_previous_stream -> /tmp/pod/organizations/b22453d9-8917-4c57-8a55-d3c554e1b16f/streams/b22453d9-8917-4c57-8a55-d3c554e1b16f/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/chicago-2021-02/streams/chicago-2021-02/removed_since_previous_stream -> /tmp/pod/organizations/chicago-2021-02/streams/chicago-2021-02/removed_since_previous_stream
created: https://pod.stanford.edu/organizations/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/streams/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/removed_since_previous_stream -> /tmp/pod/organizations/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/streams/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/removed_since_previous_stream
cbeer commented 3 years ago

Is there any way to provide the full request headers for one of those failing requests?

anarchivist commented 3 years ago

Yes - working on that now, but it looks like the cause of this issue might be the access token not being pushed down to the proper bits of the resync.Client class.

anarchivist commented 3 years ago

...and, verified. resync.Client isn't getting the token. Nonetheless, we could arguably keep this open (as probably this should be returning a 401 instead of 500?).

cbeer commented 3 years ago

🤷‍♂️ there's a lot of finger-pointing in the 8-year old upstream issue: https://github.com/heartcombo/devise/issues/2332

anarchivist commented 3 years ago

pointing spiderman meme