influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29.07k stars 3.56k forks source link

Should be able to sample points #484

Closed pauldix closed 8 years ago

pauldix commented 10 years ago

Multiple users have asked for the ability to sample points from a series. They'd like to do something like

select min(value) from temperatures sample by time(1d) where time > now() - 30d

The idea being that instead of doing the group by and having the resulting ticks have their timestamp set, instead the raw ticks that are sampled should be returned. So that query would actually just return all columns for each of the min datapoints for each day.

This wouldn't make sense for every aggregate function, but for things like first, last, min, max it would work. And we could add something later like random or other sampling methods I'm not thinking of.

Thoughts and feedback on this are welcome. There are probably scenarios in which you want to do sampling that maybe this won't make sense. So let us know!

monnand commented 10 years ago

This will be a useful feature.

For random sampling part, it may be useful if there are several random sampling algorithms that users could choose from, e.g. uniform sampling using reservoir algorithm, weighted sampling based on time, etc.

denisbrondy commented 10 years ago

Hello, Clearly, it would be howesome... This is, from my point of view, a feature which makes a lot of sense in all kind of situation. All trending tool with graph/plot need a downsampled dataset to be displayed instantaneously. With this feature, continuous aggregated queries (at least first, min and max on a given period) will provide a drastically light and ready for display dataset... Perhaps, the "group by" clause should remain as it is built and a new one should be introduced. I don't think it exists a standard and pertinent keyword for that.. "over time(10m)", "covering time(10m)"

falzm commented 10 years ago

:thumbsup:, we definitely need this feature for https://github.com/facette/facette ;)

meteozond commented 10 years ago

+1

jbclements commented 9 years ago

+1, sigh

marcuswestin commented 9 years ago

+1

marcuswestin commented 9 years ago

Nevermind: AVG(value) and GROUP BY time does what I need. Cheers :)

xiic commented 9 years ago

+1, including the proposed syntax (keep it the same as GROUP BY, just with a different keyword) and the RANDOM aggregate function. Are there any plans for this?

luxingxiao commented 8 years ago

+1