ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
9 stars 7 forks source link

Reset caps when seeds appear #8

Closed anjackson closed 5 years ago

anjackson commented 6 years ago

When seeds are injected into the crawl, they should also clear any capping of the crawls. This means resetting the counters and waking and retired queues. (e.g. reconsiderRetiredQueues as shown here).

anjackson commented 6 years ago

Note that the quotas are cleared, I think, here.

However, no equivalent of

appCtx.getBean("frontier").reconsiderRetiredQueues()

has been added yet.

anjackson commented 5 years ago

Now implemented and well-tested, enqueued URLs with a resetQuotas annotation cause the host quotas to be reset at the start of the fetch chain.