Extend URL Reciever to allow different event stores to be used

The KafkaUrlReceiver could be refactored to offer different storage options, e.g.

Kafka, which is extremely scalable but hard work to deploy (needs zookeepers, replication etc),
NATS Streaming which is a streaming store+API that persists to disk so should scale quite well. The Java API looks nice too.
NSQ looks potentially useful but it's not clear to me how to make sure the clients resume consumption cleanly, or how to rewind to go back and check the event stream.
Redis Streaming which is a widely-used stream store but one limited by RAM (c. 100MB/1e6 messages, so too small for domain crawls).
Log files, based on using Tailer. See also the source for Tailer.java and this related example. We'd need to implement the offset durability logic ourselves, hooked into the H3 checkpoint mechanism. Would also need to understand log rotation, and possibly implement it in the consumer so it can be synchronised with the checkpoints.

This would allow the same continuous crawling behaviour to be used without requiring Kafka. This would make it easier to others to experiment with our crawl set-up more easily. But it would significantly increase the integration testing needed, will have no log compression, and we may not use it.

ukwa / ukwa-heritrix

Extend URL Reciever to allow different event stores to be used #22