Open beckettsean opened 8 years ago
+1 I tried to do this today. It would be a great help.
Perhaps it's not the best issue to comment on the case but one of the things that I find stange is the way that tags are treated. Quote from the docs:
You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata.
So if you think of tags
as fields
on steroids - indexed fields, then having normal, field
like query operators running on them seems natural. Creating a subset of SELECT
that works on tags and is called SHOW
is nowhere near a generic solution, it rather complicates stuff for the users. There will be cases when someone needs some sort of funcitonality that's already there but works on fields only so even more will be forced onto the SHOW
command mirroring SELECT
behaviour. I'm no pro to evaluate the difficultness of the task but from the client side I see a lot of sense in it.
From my point of view - the docs should state that tags are not ideal for storing commonly-queried metadata - the I ideal for using group by
to group real data that is stored in fields
. The way tags are used in InfluxQL makes them a second class data for me.
@groblus it is not valid to think of tags as related to fields. They are completely different concepts. All tag keys and values are stored only as strings, and only once, in the index, for each series. Fields are stored as float, int, bool, or string, and are individually stored with each point, and are unindexed.
SHOW queries operate on metadata like databases, measurements, and tags. SELECT queries operate on the metrics themselves, which are the fields. Tag values may also be SELECTed, but only to differentiate the field results.
I would like to see this as well. Here is my use case.
With telegraf and procstat plugin a want to monitor a set of processes that all have the same name (but different pid, obvisouly). Then I plot the data on grafana dashboards, using template features.
With grafana template features I can create a variable based on a query. For instance, I have this template variable to filter the set of processes by the process name I want:
$pname = SHOW TAG VALUES FROM "procstat" WITH KEY = "process_name"
So far so good. Now, I want to create a second template variable for pids that match pname. Unfortunately, I cannot do something like this:
$pid = SHOW TAG VALUES FROM "procstat" WITH KEY = "pid" WHERE process_name=$pname
Although, and strangely, the query is considered valid if I try it in the web interface of influxdb. But it returns nothing. (of course in the web interface of influxdb I replaced $pname with a string) Without the where clause I am left with the full list of pids for all monitored processes, and it is hard to know which ones are the ones corresponding to the particular set of processes I want to monitor.
Actually, same would happen if I wanted to pre-filter by the host
key. Fortunately I only have one machine to monitor at the moment...
EDIT: Actually it is even worst. If the template variable contains all pids, then the dashboard gets repeat even if there is no data. So, it's kind of grafana issue, but the most natural way to solve it would be that the pid variable gets populated only with values valid for the current filter. Which is possible only with a where clause for SHOW TAG VALUES.
+1. same issue here... The other solution would be to replicate all TAGS as VALUES. This is not efficient and will add LOTS of data inside series, almost for nothing. But doing it this way I could :
SELECT value_pid from procstat where process_name=$pname
OR, and this is also related to some other opened issues, be able to select tags directly in the result set :
SELECT tag_pid from procstat where process_name=$pname
I relaly think the later would be the better solution. Of course, a where clause of the SHOW TAG VALUES is also good...
+1
+1
+1
+1
+1 on being able to restrict tag values returned based on a time range.
This feature would be extremely useful when using Grafana templating variables. In some cases we can have 100s of values listed when only a few might pertain to time period being viewed.
+1. Really need this for Grafana templates. Can't deploy without it.
+1 Same use case here, grafana dashboards and plenty of customer demand
I've found a workaround that seems to work in my environment.
Create a short retention policy and then create a continuous query that groups by the tags you want to use in the grafana templating.
CREATE RETENTION POLICY "lookup" ON "prod" DURATION 2d REPLICATION 1
I have some extra info that i can group by and push it into a fake measurement called host_info.
CREATE CONTINUOUS QUERY "lookupquery" ON "prod" BEGIN SELECT mean(value) as value INTO "lookup"."host_info" FROM "cpuload" where time > now() - 1h GROUP BY time(1h), host, team, status, location END;
I dont care what the values are, its just about getting a distinct list of hosts that are up and pushing data in the last few days. I have this in my templates in grafana as something like this:
SHOW TAG VALUES FROM "lookup"."host_info" WITH KEY = "location"
I'm just going to reuse the templates in the users dashboards and a few variations of it for other host info like disk etc...
It works for me, hope others can find something similar!
+1
+1
+1
+1
+1 Need this for grafana templates as well @furnacification - Thanks a lot, that worked! Notice that you don't need the where clause for CQ.
+1 Main use case is for grafana template variables. Showing a drop down of 300 entries when the time period only has 3 unique values is no fun!!
I am running into the same issue when I try to use grafana template variables. Any update on having a WHERE time clause for SHOW TAG VALUES?
This restriction is definetly a big problem for us, especially where some of our tag values can increase over time, and managing the size with retention policies isn't always the answer.
One workaround we had thought about might be the idea of using output from SELECT value,tag WHERE $timeFilter and discarding everything other than the desired tags? Probably horribly expensive, especially seeing as it would be a disk operation instead of access from memory indexes, but a possibility if the selected time series wasn't too big?
I was shocked when I found out this wasn't a feature when using Grafana template variables.
For my usage we have a high number of hosts and a high turnover of hostnames. Even with a retention policy of 30 days, I have some query strings in Grafana using template variables with a filter which exceeds 250k characters. Instead only handful of hosts need to be selected for that period.
This is a feature with should be added.
@beckettsean I dont think its that expensive to implement for the lower bound case i.e where time > now() - current_time_range
. For each insert the metadata store is anyway needed to be checked if a new tag
value has been added. Instead it can update a last_updated
counter if the tag exist, otherwise add a new entry with the current timestamp
. This it self will be enough for almost all my use cases of initialization of the live graphs. Similarly a first_updated
can be used, which too will suffice for most cases as almost always a tagged resource send the metric without interruption (big gaps) .
I will be happy to give a pull request if agreed on this approach.
Here's an example of my workaround. Just wrote a script that updates the templated variables to be the ones that I wanted within a specified timeframe https://gist.github.com/zezuladp/9159a20b409a9f5f5a9fff585991626b
+1
+1
+1
+1
+1
+1
+1
+1
hi guys, any ETA on this?
+1
+1
+1
+1
+1
+1
It makes way more sense to be able to use a normal select on these tag values than this show command imo.
+1
+1
+1
+1
+1
In order to keep e-mail and other feed noise down, it's appreciated if people add a thumbs up reaction to the first post and subscribe to the issue instead of adding a +1 comment.
for Grafana, I have used the regex to filter out the noise from the fetched tag-values in Templating.
for example, to get "host" names, i used this query for template:
SHOW TAG VALUES FROM "netdata.system.cpu.system" WITH KEY = "host"
Now, for filtering, I used regex like /^operations/
to get only host-names with "operations". It's no way a replacement for OP's requested feature, rather just a silly workaround.
It may help someone.
After switching to tsi1 querying a lot of tags became very slow operation. I understand that is necessary consequence of storing tags on disk, however requesting just limited set of tags for certain amount of time would compensate for that.
+1.Any news? Really need this for Grafana templates.
+1
In order to keep e-mail and other feed noise down, it's appreciated if people add a thumbs up reaction to the first post and subscribe to the issue instead of adding a +1 comment.
Rather than update
DISTINCT()
to accept tags (https://github.com/influxdata/influxdb/issues/3880), we could accomplish much the same goal if theSHOW TAG VALUES
query respected aWHERE time
clause. SHOW TAG VALUES already returns the deduplicated set of values.It is not trivial, since the query can no longer be answered directly from the metastore, but it seems easier than making functions accept tags, which is a totally new code path.