ukwa / ukwa-manage

Shepherding our web archives from crawl to access.
Apache License 2.0
10 stars 5 forks source link

Check indexer handling of rendered items or metadata URLs #72

Open anjackson opened 4 years ago

anjackson commented 4 years ago

e.g. are screenshot:https handled properly.

2020-04-29 15:19:18,753 INFO: attempt_202002261158_0423_m_000224_1: Apr 29, 2020 3:10:48 PM org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter generi
cResult
2020-04-29 15:19:18,753 INFO: attempt_202002261158_0423_m_000224_1: WARNING: FAILED canonicalize(har:https://twitter.com/InterbankLGBT/):BL-NPLD-WEBRENDER-frequent-npld-202
00227133858-20200425151705068-03362-0o4xyiz2.warc.gz 143220215
2020-04-29 15:19:18,754 INFO: attempt_202002261158_0423_m_000224_1: Apr 29, 2020 3:10:48 PM org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter adaptI
nner
2020-04-29 15:19:18,754 INFO: attempt_202002261158_0423_m_000224_1: INFO: Skipping record type : resource

Also: CDX indexer should convert metadata:// URIs to urn:embeds: