senx / warp10-platform

The Most Advanced Time Series Platform
https://warp10.io/
Apache License 2.0
385 stars 53 forks source link

LTTB creates duplicate Timestamps #198

Closed simonfey closed 7 years ago

simonfey commented 7 years ago

Calling LTTB on some GTS can produce a duplicate datapoint in the beginning.

Steps to reproduce:

NEWGTS
0 1200
<% 'k' STORE $k 1000 * 500 + NaN NaN NaN $k 180.0 / DUP SIN SWAP 100 * COS + ADDVALUE %>
FOR
1000 LTTB

Result: {"c":"","l":{},"a":{},"v":[[500,1],[500,1],[3500,-0.0790576529419306] .........

Also notice, while the original GTS does have 1200 datapoints, the result does only contain 602 datapoints.

hbs commented 7 years ago

The duplicate tick is the first one because we include it unconditionnaly.

simonfey commented 7 years ago

In GTSHelper.lttb() you specify the bucketsize as an integer, while in LTTB the bucketsize should be a floating point.

int bucketsize = (int) Math.ceil((double) gts.values / (double) threshold);

Since you ceil the result, your buckets are way larger than they should be (and in my example they should overlap quite a lot). This would have the side-effect that you process records in the wrong bucket, leaving the next bucket without values. Should not be an issue if your values.size is much bigger than your threshold. ... Just brain-debugging. Can't really prove it from here.