influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.98k stars 3.56k forks source link

tsm1 storage engine reports strange spikes in values #4357

Closed dswarbrick closed 9 years ago

dswarbrick commented 9 years ago

Using collectd to feed perpetually increasing, counter-type values (e.g. eth0 rx/tx packet counters) into InfluxDB with tsm1 results in strange outliers occasionally:

SELECT "value" FROM "iolatency_read" WHERE "host" = 'foo.example.com' AND "instance" = 'md400' AND "type" = 'req_latency' AND "type_instance" = 'lt_8' AND time > now() - 10m

1444231716020334000 3.73705375e+08
1444231736022761000 3.73705896e+08
1444231756020388000 3.73706677e+08
1444231776016345000 3.73707002e+08
1444231796016225000 3.73708699e+08
1444231816019400000 3.73708855e+08
1444231836020839000 -8.551788012208318e-238
1444231856020667000 3.73709352e+08
1444231876022680000 3.737098e+08
1444231896014367000 3.7371048e+08
1444231916020391000 3.73711277e+08
1444231936020522000 3.73711405e+08

This in turn causes derivative(mean("value"), 1s) to go bananas, and results in unusable Grafana graphs.

Nothing has changed on the sending (collectd) side - only the storage engine in InfluxDB.

jwilder commented 9 years ago

Are you using the native collectd plugin?

jwilder commented 9 years ago

I'm able to reproduce some odd values using the native collectd plugin.

dswarbrick commented 9 years ago

@jwilder Yes, I'm using the native collectd plugin, just as before, when running 0.94.2.

jmcook commented 9 years ago

I'm seeing this even with telegraf data, specifically in the mem_available_percent measurement.

jwilder commented 9 years ago

I've verified that there is a bug in the float encoding when similar values are recorded. Working on a fix.

jmcook commented 9 years ago

sweet, thanks Jason.

johnl commented 9 years ago

I had a similar problem back in July: https://groups.google.com/forum/#!searchin/influxdb/collectd$20john$20leach/influxdb/ZUZj6m-29mk/YiZTnOiyCQAJ

dgryski commented 9 years ago

You're going to want to pull in https://github.com/dgryski/go-tsz/commit/918e888e1d33760f5ab18e41cab3857c0037abfe too

jwilder commented 9 years ago

@dgryski Thanks. Did not see that other change.

johnl commented 9 years ago

This looks fixed to me. I've not had one strange metric for interface_rx/if_packets since I deployed this code yesterday. Previously I'd had them fairly regularly! Thanks!