machawk1 / wail

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
https://matkelly.com/wail
MIT License
350 stars 35 forks source link

Allow restriction of crawls to a single domain #445

Open machawk1 opened 5 years ago

machawk1 commented 5 years ago

This has been requested a few times but there is currently no way to do this in the WAIL interface, most recently by Beaudry Allen, Digital Archivist at Villanova.

Q: What needs to be included in a Heritrix crawl job to restrict a crawl to a single domain?

Related: #350

machawk1 commented 5 years ago

Adding the following to a crawl configuration should accomplish this:

 <bean class="org.archive.modules.deciderules.surt.OnDomainsDecideRule">
      <property name="decision" value="ACCEPT"/>
 </bean>