Closed cloud-rocket closed 8 years ago
The query that causes the panic should be the log, just before the trace. Can we see it?
Sure (names changed within queries)
[run] 2015/11/25 18:26:41 Listening for signals
[query] 2015/11/25 18:26:45 SELECT String_Time AS "String_Time" FROM "dm"."default".csv_out WHERE location =~ /RL4PS$/ AND "EVENT::Flow_Rate:pump_on" > 0.000 AND time > now() - 384d AND time < now() - 9204h LIMIT 1
[query] 2015/11/25 18:26:45 SELECT mean(Flow_Rate) AS "Flow_Rate" FROM "dm"."default".csv_data WHERE location = 'RL4PS' AND time > 1415125605s AND time < 1415341605s AND time < now() - 384d GROUP BY time(1m)
[query] 2015/11/25 18:26:45 SELECT mean("24Hr_Flow_Total") AS "24Hr_Flow_Total" FROM "dm"."default".csv_data WHERE location = 'RL4PS' AND time > 1415125605s AND time < 1415341605s AND time < now() - 384d GROUP BY time(1m)
[query] 2015/11/25 18:26:45 SELECT mean(Flow_Total) AS "Flow_Total" FROM "dm"."default".csv_data WHERE location =~ /RL4PS$/ AND time > 1415125605s AND time < 1415341605s AND time < now() - 384d GROUP BY time(1m)
[query] 2015/11/25 18:26:45 SELECT mean(Flow_Rate) AS "Flow_Rate" FROM "dm"."default".csv_out WHERE location = 'RL4PS' AND time > 1415125605s AND time < 1415341605s AND time < now() - 9204h GROUP BY time(1m)
[http] 2015/11/25 18:26:45 52.19.168.90 - root [25/Nov/2015:18:26:45 +0000] GET /query?db=dm&epoch=ms&q=SELECT+%22String_Time%22+++AS+%22String_Time%22+FROM+%22csv_out%22+WHERE+%22location%22+%3D~+%2FRL4PS%24%2F+AND+%22EVENT%3A%3AFlow_Rate%3Apump_on%22+%3E+0+AND+time+%3E+now%28%29+-+9216h++AND+time+%3C+now%28%29+-+9204h+limit+1 HTTP/1.1 200 40 https://<domain>:3000/dashboard/db/dm Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36 14f9c3ca-93a2-11e5-8001-000000000000 145.634853ms
[query] 2015/11/25 18:26:45 SELECT max("EVENT::Flow_Rate:pump_on") * 2000.000 AS "Pump On" FROM "dm"."default".csv_out WHERE location =~ /RL4PS$/ AND time > 1415125605s AND time < 1415341605s AND time < now() - 9204h GROUP BY time(1m)
[http] 2015/11/25 18:26:45 52.19.168.90 - root [25/Nov/2015:18:26:45 +0000] GET /query?db=dm&epoch=ms&q=SELECT+mean%28%22Flow_Total%22%29+AS+%22Flow_Total%22+FROM+%22csv_data%22+WHERE+%22location%22+%3D~+%2FRL4PS%24%2F+AND+time+%3E+1415125605s+and+time+%3C+1415341605s+AND+time+%3C+now%28%29+-+9216h+GROUP+BY+time%281m%29 HTTP/1.1 200 8137 https://<domain>:3000/dashboard/db/dm Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36 14fc3c82-93a2-11e5-8003-000000000000 152.632562ms
[query] 2015/11/25 18:26:45 SELECT mean("24hr_Flow_Pr") AS "24hr_Flow_Pr" FROM "dm"."default".csv_out WHERE location = 'RL4PS' AND time > 1415125605s AND time < 1415341605s AND time < now() - 9204h GROUP BY time(1m)
panic: runtime error: slice bounds out of range
BTW, it worked for several days without problem and suddenly started crashing.....
Some updates on this one:
I am planning to remove this database, unless I can somehow help in resolving this issue....
BTW The system updated to 0.9.5.1 - same result
Meir Tseitlin
@cloud-rocket I notice that all of the queries from your log snippet have two upper bounds:
AND time < 1415341605s AND time < now() - 384d
AND time < 1415341605s AND time < now() - 9204h
I'm not quite sure what the effects of that are, but it doesn't seem like a good practice. Can you alter your queries so they have a single upper bound?
@beckettsean - as I mentioned in my last comment - the problem persists with a simple "select * from X" query as well. It is not possible to query from this measurement anything.......
Regarding upper bounds - the first one is set by automatic dates selection of Grafana (which cannot be altered) and the second one is my "fine tuning".....
It is not so important for this case, unless such queries might cause data corruption (sounds unrealistic)
Is there anything else I can provide to help resolve this out?
@cloud-rocket I'm still struggling to understand the full scope of the issue.
The title says InfluxDB crashes with a panic when accessed from Grafana.
The log you provided has queries that are potentially an issue, but you also state that any SELECT * FROM foo
query crashes, but only for foo
. Queries like SELECT * FROM bar
and SELECT * FROM baz
succeed.
If you query the measurement foo
directly via curl
, does it panic the database?
"show tag values" is not working (but not crashing the DB)
Define "not working". Please give the actual input and output.
Are the queries that you provide in https://github.com/influxdb/influxdb/issues/4907#issuecomment-159695726 against the foo
measurement or one of the others?
It is not so important for this case, unless such queries might cause data corruption (sounds unrealistic)
I don't think they would cause corruption, but a panic, perhaps. Did queries like that work before the foo
measurement panicked on any query?
Sorry for a delay...
As the title says - the problem started from Grafana, but as I later discovered - it is not related to Grafana specifically, but for documentation reasons I left the title (and original queries as-is).
The problem is with specific measurement, other measurements within the same database are working normally. For this specific measurement a simple query like "select * from X" crashes the DB as well..... I tried accessing it from InfluxDB Admin panel.
All queries (including complicated queries from Grafana) worked before suddenly started crashing. They also continue to work perfectly on other measurements within the same database (I replaced this measurement with another one with the same data).
"show tag values" returns
"Server returned error: error parsing query: found EOF, expected WITH at line 1, char 17"
I will later check curl from command line...
@cloud-rocket SHOW TAG VALUES
requires a WITH
clause. I've opened https://github.com/influxdb/influxdb.com/pull/552 to clarify the docs on that.
If the issue is specific to a single measurement there are two possibilities that I can see:
So regarding #2, is the measurement with issues different from other measurements in your system? Does it have more or fewer tags? Does it have a long or short name? Can you share actual schema from your system (e.g. SHOW TAG/FIELD KEYS, SHOW TAG VALUES)
If it is #1, there is not much we can investigate without the actual data. Is it possible for you to upload a compressed tarball of the influxdb
directory?
@beckettsean Thanks for trying to figure it out.
I will try to dig further regarding this issue, because it seems like it is not related to a specific measurement, but with the way I am working with it (through Grafana). The first measurement is "dead" - you cannot query out of it. But I recreated it in another measurement (same database) and from time to time it crashes the DB as well (even though the crash is probably not corrupting it, because it works afterwards). I still cannot pinpoint what exactly causes it..... (BTW I have other databases on the same server, which are working fine)
Part of the information stored is sensitive, I will take a look of what could be done in terms of upload.... Maybe I will somehow provide access to the instance (EC2)....
@cloud-rocket I'm looking into this issue and the panic seems to happen when a string value is read for a field. Are you writing a string value to any field?
Also, how are you writing the data into InfluxDB? Are you using the HTTP protocol or one of the service plugins (graphite, opentsdb, etc.)?
@jsternberg Yes one of the fields is string and according to my filling it actually might be related.
I am inserting a copy of the index as a string, because you cannot select the index itself with the current API (I think you should allow it)
This is the query
SELECT "String_Time" AS "String_Time" FROM "test" WHERE "EVENT::val0:failure_threshold" > 0 AND $timeFilter AND time > now() limit 1
The data is written with pandas DataFrame client of Python lib
Is it possible for the string that is being written to be greater than 65531 bytes long?
Can you also expand what you mean by "inserting a copy of the index"? What index do you mean? Can you give me a example of what you think should be allowed by the API?
Sure,
I am willing to select (and display) the timestamp of the point and while it is not possible with the current api, what I did was to allocate another string value to store this timestamp (and later select it).
The string itself is a constant size and obviously very short.
"select time ...." is not possible
Is it possible this query crashing the server when there are no results to return?
SELECT "String_Time" AS "String_Time" FROM "test" WHERE "EVENT::val0:failure_threshold" > 0 AND $timeFilter AND time > now() limit 1
(just my hypothesis)
@cloud-rocket -- I don't really understand what you are trying to do.
When a point comes back, its timestamp is part of the data returned. Why are you trying to select on a particular timestamp?
In any event, can't you just write time = <some particular time>
in your query?
Adding the timestamp as a tag (which will result in terrible write performance) or as a field (which is redundant) doesn't make sense to me. Perhaps I am missing something?
@otoolep
A timestamp is written as a redundant field (not tag)
If I were writing my dashboard from scratch, I'd be making the calls by the book (your book) and make it as simple as possible. Unfortunately I am working with Grafana which is suitable for 80% of what I need and with the rest 20% I need to improvise.
Long-story short - I need a textual display stating when the next event is going to appear (in future). Grafana is not supporting displaying timestamp value in it's Single Stat panel (or at least not something I am aware of), this is why I am putting this timestamp as an additional field.
@cloud-rocket this bug shouldn't be an issue with TSM as the code is completely different. As we are deprecating bz1, are you capable of testing the influx_tsm
tool on your shards to see if you get an error or not?
Can you make a backup of the bz1 shards and then try running with the converted shards in tsm1 using the 0.10 beta and see if this issue still exists?
Closing this since bz1
is deprecated and tsm
is the default engine in 0.10
.
There is a conversion tool to convert your bz1
shards to tsm
with instructions here: https://github.com/influxdata/influxdb/tree/master/cmd/influx_tsm
64-bit Ubuntu 14.04. (2gb memory) Version 0.9.5 stable (pre-built package) - upgraded from RC3 (which caused crashes as well)