ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
9 stars 7 forks source link

Links not being extracted from site maps #41

Closed anjackson closed 5 years ago

anjackson commented 5 years ago

Unfortunately, links were never being extracted from site maps, as no XML extractor was in place. Adding an instance of ExtractorXML to the fetch chain should do it.

anjackson commented 5 years ago

Verified as working using GOV.UK.