ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
10 stars 7 forks source link

Retire-and-awaken queues rather than emitting all over-quota URIs. #39

Closed anjackson closed 5 years ago

anjackson commented 5 years ago

Currently, the crawler logs all over-quota URIs rather than retiring the queue. However, this makes the log files kinda big. We could retire the queues and keep checking them again after messages come in.

anjackson commented 5 years ago

I've extended the crawler to make this behaviour configurable at runtime 55ad5e7dde3a1bbe14004aa36c01f2c5e283adf8 -- but note that the new mode has not been tested fully yet.

anjackson commented 5 years ago

Still not tested at scale, but seems to work.