ytyou / ticktock

TickTockDB is an OpenTSDB-like time series database, with much better performance.
GNU General Public License v3.0
76 stars 9 forks source link

TT doesn't honor downsampling interval of milliseconds #49

Closed jens-ylja closed 1 year ago

jens-ylja commented 1 year ago

Hello, unfortunately I have to file another issue related to data querying from TickTock. I've seen this right at my first experiments with the provided TT & Grafana Docker bundle, but interpreted as a stupid user error.

But now, with my "production" setup the same issue happens and thus I dug a little deeper.

The effect: When drilling down a metric with Grafana - more and more shortening the inspected interval - all works well up to a certain interval duration. When shortening the interval even more the response data collapses to a single point (or a few points). How long this "magic" interval is, depends on the screen width and resolution - for my 2560x1440 monitor, this happens between 10 and 5 minutes interval. With my mobile phone 5 minutes still working.

The trigger for this effect are the maximum displayable data points and - derived from this - the downsampling interval used by Grafana. All works well as long as the downsampling interval is one second or more. Data collapses once Grafana switches to a milliseconds interval.

I verified this at command line. The origin data within the database have one point each 10 seconds.

For reference - number of metrics and total number of data points :

$ wget -O - --quiet 'http://localhost:6182/api/query?start=1681662000000&end=1681662600000&m=none:1s-avg:s10e.power{_field=Batterie}' | jq '.[].metric' | wc -l
1
$ wget -O - --quiet 'http://localhost:6182/api/query?start=1681662000000&end=1681662600000&m=none:10s-avg:s10e.power{_field=Batterie}' | jq '.[].dps' | wc -l
62
$ wget -O - --quiet 'http://localhost:6182/api/query?start=1681662000000&end=1681662600000&m=none:1s-avg:s10e.power{_field=Batterie}' | jq '.[].dps' | wc -l
62

This collapses to two points, when switching to 500 milliseconds downsampling interval:

$ wget -O - --quiet 'http://localhost:6182/api/query?start=1681662000000&end=1681662600000&m=none:500ms-avg:s10e.power{_field=Batterie}' | jq '.[].dps'
{
  "1681662000": -334.68,
  "1681662500": -570
}

It colapses to a single point when writing 1s as 1000ms:

$ wget -O - --quiet 'http://localhost:6182/api/query?start=1681662000000&end=1681662600000&m=none:1000ms-avg:s10e.power{_field=Batterie}' | jq '.[].dps'
{
  "1681662000": -373.9
}

I've dug even deeper and the result is: NNNms is simply interpreted as NNNs.

Note: My TT instance is configured without a tsdb.timestamp.resolution and thus works with the default tsdb.timestamp.resolution = second. I didn't tested if the behaviour is the same with tsdb.timestamp.resolution = millisecond.

Proposed solution:

ylin30 commented 1 year ago

To support ms TT has to be configured with TSDB.time.resolution = millisecond. Default is second. Unfortunately 0.11.7 doesn't support both at the same time.

jens-ylja commented 1 year ago

To support ms TT has to be configured with TSDB.time.resolution = millisecond. Default is second. Unfortunately 0.11.7 doesn't support both at the same time.

Yes, that's true and that's OK. I think it isn't needed, to support TSDB.time.resolution = millisecond and TSDB.time.resolution = second at once within a database.

But a downsampling instruction of 500ms-avg must not be interpreted as 500s-avg regardless of how TSDB.time.resolution is defined.

ylin30 commented 1 year ago

To support ms TT has to be configured with TSDB.time.resolution = millisecond. Default is second. Unfortunately 0.11.7 doesn't support both at the same time.

Yes, that's true and that's OK. I think it isn't needed, to support TSDB.time.resolution = millisecond and TSDB.time.resolution = second at once within a database.

We have been hesitant to get rid of TSDB.time stamp.resolution since determining resolution by number of digits may lead to ambiguity. But seems this is even worse. It may lead to confusion with such a config. For example I noticed that your question timestamp is 13 digits implying ms but you used a second resolution in TT by default. We will rethink removing the config later.

But a downsampling instruction of 500ms-avg must not be interpreted as 500s-avg regardless of how TSDB.time.resolution is defined.

This can be quickly fixed. Let me confirm the bug. I didn't read ur 1st post carefully just now.

jens-ylja commented 1 year ago

@ylin30 sorry for the confusion with the milliseconds query - I just copied this from the timestamps produced by Grafana queries.

Btw. I'm fine with the TSDB.time.resolution property. I think it's clear and understandable. If a TT instance is configured with TSDB.time.resolution = second and someone sends data or queries data with timestamps in milliseconds, its worth to round (or better strip?) to seconds.

If not having such a configuration, all time stamps have to be treated as/converted to milliseconds because one could send milliseconds and seconds in mix when storing data. Working in milliseconds for data as mine (having an origin resolution of 10s or even a minute) on the other side would be a waste.

ytyou commented 1 year ago

This is indeed an issue with TT. The 500ms interval was indeed interpreted as 500s. We will fix it in the next release (v0.11.8), which should come out some time this week. Thanks for reporting this.

ytyou commented 1 year ago

@jens-ylja v0.11.8 is released. Thanks.

jens-ylja commented 1 year ago

@ytyou I've just updated to v0.11.8 and re-tested. I can confirm the issue to be solved. Thanks a lot for fast responses all the time :)