The fact that incoming launch events can also re-configure the crawl means we should really log a bit of that information somewhere so people can work out what's going on.
A lazy version would be to clone the whole launch message into the extra-info JSON blob. But this could include e.g. cookies etc. and is really overkill. Crucial properties are:
isSeed if it's a seed
forceFetch ? -- not clear as this only forces the CrawlURI to be enqueued into the frontier (as a re-prioritisation method)
the list of sheets applied
the targetSheet spec? (this can go in the JSON blob)
The fact that incoming launch events can also re-configure the crawl means we should really log a bit of that information somewhere so people can work out what's going on.
A lazy version would be to clone the whole launch message into the extra-info JSON blob. But this could include e.g. cookies etc. and is really overkill. Crucial properties are:
isSeed
if it's a seedforceFetch
? -- not clear as this only forces the CrawlURI to be enqueued into the frontier (as a re-prioritisation method)targetSheet
spec? (this can go in the JSON blob)launchTimestamp
already implementedrefreshDepth
resetQuotas
already implemented