ukwa / ukwa-services

Deployment configuration for all UKWA services stacks.
Apache License 2.0
4 stars 5 forks source link

Be more systematic about excluding web archives from crawl activity #117

Open anjackson opened 1 year ago

anjackson commented 1 year ago

Currently, we hard-code some web archives we might hit so we don't bother re-crawling them. Can we do this a more intelligent way? e.g. using the Memento registry http://timetravel.mementoweb.org/guide/api/#registry to create a block list to integrate with the nevercrawl list in #36