Quotas reset on restart

Because the higher quotas are set via launch events, when restarting, these get lost, so right now the crawler is just dropping lots of URLs -5003. These would likely have been downloaded eventually otherwise.

This is an example of why crawl config should be handled differently. e.g. the tocrawl topic should be compacted against a per-seed key, and the whole topic re-read each time, so that the configuration is always up to date.

ukwa / ukwa-heritrix

Quotas reset on restart #76