ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
9 stars 7 forks source link

Always get prerequisites that are resolved via redirects? #43

Closed anjackson closed 5 years ago

anjackson commented 5 years ago

Currently, the org.archive.modules.deciderules.PrerequisiteAcceptDecideRule only matches:

hop path matches ^.*P$

but should this really be...

hop path matches ^.*PR+$

I'm seeing this a lot with robots.txt resolution from http to https, but perhaps that's fine, as robots.txt for http:server is not the same as robots.txt for https:server.

anjackson commented 5 years ago

Hmm. For now, it's probably best to stick with the existing conception of pre-requisites.