grafana / carbon-relay-ng

Fast carbon relay+aggregator with admin interfaces for making changes online - production ready
Other
467 stars 151 forks source link

Out of order data produced by aggregator #398

Open replay opened 4 years ago

replay commented 4 years ago

A user has shared detailed information with us which shows that a crng aggregator has produced out of order data for them. The issue seems to be hard to reproduce and only occurs from time to time (sounds like race condition to me).

This is the aggregator config they used:

[[aggregation]]
function = 'sum'
regex = '^<regex with 3 capture groups>$'
format = '$1.$2.$3'
interval = 60
wait = 65
dropRaw = true

Then the user added a carbon route which looks like this:

[[route]]
key = 'carbon-default'
type = 'sendAllMatch'
destinations = [
  '127.0.0.1:12003 pickle=false'
]

And recorded the generated data by doing nc -l -p 12003 > <file>. Additionally to that carbon route they also had an active grafanaNet route, all the traffic was written to both routes.

The recorded nc output clearly shows that some of the generated data is not in order:

<aggregated metric> 7.0 1582838400
<aggregated metric> 5.0 1582838340
<aggregated metric> 1.0 1582838520
<aggregated metric> 5.0 1582838460

The user also recorded the data which was sent to carbon-relay-ng via tcpdump, this recording shows that the metrics going into the aggregator were in order.

robert-milan commented 4 years ago

Problem appears to be solved currently. Leaving this ticket open to re-investigate if it comes up again.

replay commented 4 years ago

Re-opening this issue, as it has already appeared again

fkaleo commented 4 years ago

Could not figure out the cause yet.