With the aggregation of the active_nodes table, the crawler dumps all the registered nodes into a new row every 12h. However, this dump or snapshot is done 12h after the startup of the crawler.
The idea behind this was to avoid having 0 active nodes tracked on the table when it was getting started.
The side effect of this decision is that whenever the crawler is stopped, there won't be a snapshot in (crawler_shutdown_time - time_between_last_snapshot) + 12h, which might not be ideal for tracking the distribution with enough resolution.
Possible solution
The simplest solution is to make a new snapshot right when the crawler is started, so at least there is track of the distribution of when the crawler was restarted
Description
With the aggregation of the
active_nodes
table, the crawler dumps all the registered nodes into a new row every12h
. However, this dump or snapshot is done12h
after the startup of the crawler. The idea behind this was to avoid having0
active nodes tracked on the table when it was getting started. The side effect of this decision is that whenever the crawler is stopped, there won't be a snapshot in(crawler_shutdown_time - time_between_last_snapshot) + 12h
, which might not be ideal for tracking the distribution with enough resolution.Possible solution
The simplest solution is to make a new snapshot right when the crawler is started, so at least there is track of the distribution of when the crawler was restarted