Closed jens-ylja closed 1 year ago
I have some curious crashes with SIGSEGV (11) when TT executes a query issued via Grafana.
Thanks. We will try to repro in our side here.
- put another measurement using the OpenTSDB line protocol - note, this has one tag more:
put sensor.status 1681147870 0 type=s10e device=s10e _field=val
_field
is a reserved word for Influx line protocol. It is actually mapped to a metric in OpenTSDB protocol. We don't expect such a usage. This might be the cause. If so, we might consider blocking such a usage in the future.
You can execute this in any order of steps 2, 4 and 6. You can even leave step 4 out. If - on the other side - running 4 and 6 only, all works well -> you will see two metrics which are completely identical except the time stamps.
One can think - what a crazy idea to mix OpenTSDB and Influx line protocols. But I was unsure which one to use is the better idea. Using OpenTSDB would use consistent APIs for read and write. But when having a series of fields within a measurement, the Influx line protocol seems to be much more compact.
Yah, it is really a stress testing mixture of Opentsdb and influx line protocol. Influx line protocol is definitely more compact and the preferred way to write. TickTockDB also performs much better if fields in one line are always in the same order between different requests. For example, this is good.
cpu,host=rpi,id=1 usr=10,sys=20,idle=70 1465839830000
cpu,host=rpi,id=1 usr=10,sys=20,idle=70 1465839890000
This is bad:
cpu,host=rpi,id=1 usr=10,sys=20,idle=70 1465839830000
cpu,host=rpi,id=1 sys=20,idle=70,usr=10 1465839890000
Note: I ran this using TT version 0.11.4 on ARM v71. I'll try the behaviour with 0.11.6 within the next couple of days.
I suspect 0.11.6 might also crash in this scenario.
We will get back to you once we confirm the repro. Thanks!
Regarding the usage of _field
in OpenTSDB protocol uses. This was intended to be able to use the same queries (e.g. in Grafana) regardless of the line protocol used for writing the data. Otherwise the queries and/or evaluation of the results must differ depending on which line protocol was used when writing the data.
Regarding the usage of
_field
in OpenTSDB protocol uses. This was intended to be able to use the same queries (e.g. in Grafana) regardless of the line protocol used for writing the data. Otherwise the queries and/or evaluation of the results must differ depending on which line protocol was used when writing the data.
Fair. You are right.
_field is a reserved word for Influx line protocol. It is actually mapped to a metric in OpenTSDB protocol.
I was wrong in this comment above. It is actually considered a new time series but still a tag.
So far I tried 0.11.5 in RPI-zero-w (ARM V6 - 32bit) and it doesn't crash although the query results look fishy to me. m=first:sensor.status
It seems the first
keyword is not processed properly. I am resetting to 0.11.4 and see if TT crash.
@jens-ylja I can repro the crash in 0.11.4 in RPI-zero-w (ArmV6 32bit) now. You can try 0.11.5 or latest version 0.11.6 with no crash. Nevertheless, there are still a couple of problems we need to resolve. Mainly:
_field
(such as put sensor.status 1681147870 0 type=s10e device=s10e _field=val
) will be blocked. It will cause mismatch in meta data. Reads with _field
are still allowed.m=first:sensor.status
.Thanks for your efforts. It really helps us improve TickTockDB.
@ytyou - thanks for reply. Regarding your points:
- This is indeed a bug. We have made a fix in the 'dev' branch. It needs further testing before it can be released;
Good to hear I could help to find and fix a bug.
- 'first' and 'last' can not be used for aggregation; it only make sense when used to perform downsampling;
For my queries none of the aggregations make sense at all. I have a couple of these sensor.status
series, each with a different type
and device
tag set - the values are discrete (enum values).
I want to query them all together but do not want them to be aggregated. Unfortunately with Grafana I'm required to provide an aggregation. But none of these aggregations makes sense at all for enumeration values. The only ones working - providing all sensor.status
series as separate metrics - are first
and last
. A none
aggregation would be nice but I cannot configure it to Grafana.
- Using '_field' in OpenTSDB style put request is still allowed, but we are considering to ban it. Maybe there's a good reason to support it...
My reason was the query compatibility between metrics written with OpenTSDB line protocol and these written with the Influx one. But as @ylin30's suggestion was to use the Influx protocol - this compatibility isn't a reason any longer (for me).
We will add a "none" aggregator. Thanks for the suggestion.
- 'first' and 'last' can not be used for aggregation; it only make sense when used to perform downsampling;
For my queries none of the aggregations make sense at all. I have a couple of these
sensor.status
series, each with a differenttype
anddevice
tag set - the values are discrete (enum values). I want to query them all together but do not want them to be aggregated. Unfortunately with Grafana I'm required to provide an aggregation. But none of these aggregations makes sense at all for enumeration values. The only ones working - providing allsensor.status
series as separate metrics - arefirst
andlast
. Anone
aggregation would be nice but I cannot configure it to Grafana.
@jens-ylja Are you looking for a way to retrieve all original individual data points in related time series? If so, some other users raised the same question before. A none
aggregator would be a good fit though OpenTSDB doesn't provide it. So far our workaround is to explicitly add all tags in query, such as type=* device=*
.
@jens-ylja Are you looking for a way to retrieve all original individual data points in related time series? If so, some other users raised the same question before. A
none
aggregator would be a good fit though OpenTSDB doesn't provide it. So far our workaround is to explicitly add all tags in query, such astype=* device=*
.
@ylin30 According to http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html OpenTSDB seems to have a none
aggregator since version 2.3 - Am I wrong with this?
I checked the type=* device=*
notation - it does the job :)
Thanks Jens
@jens-ylja We have just released v0.11.7, which should have all the fixes mentioned in this ticket. A "none" aggregator is added, although you may need to refresh the page a few times for Grafana to ditch its cache and pull from TickTockDB again.
@ytyou I switched to 0.11.7 now and can confirm the none
aggregator to work -> removed the type=* device=*
query tags.
I additionally can confirm the first
and last
aggregators are not accepted any longer (except for down sampling).
I didn't verified the "crash test" - but as I understand, the tag name _field
shouldn't be allowed any longer.
I'll give feedback after trying this too.
I just re-tested the origin scenario with TT version 0.11.7:
put sensor.status 1681147860 0 type=s10e device=s10e
curl 'http://localhost:5000/api/query?start=0&m=none:sensor.status'
-> this returned a single metric -> correctput sensor.status 1681147870 0 type=s10e device=s10e _field=val
curl 'http://localhost:5000/api/query?start=0&m=none:sensor.status'
-> this returned two metrics -> as expectedsensor.status,device=s10e,type=s10e val=0 1681147680
curl 'http://localhost:5000/api/query?start=0&m=none:sensor.status'
-> this returned two metrics (one with two values) -> as expectedSeems this issue is fixed. I close it.
I have some curious crashes with SIGSEGV (11) when TT executes a query issued via Grafana.
After several hours of struggling around, I drilled it down to the following simple procedure (reproducible with just
telnet
andcurl
):put sensor.status 1681147860 0 type=s10e device=s10e
curl 'http://localhost:6182/api/query?start=0&m=first:sensor.status'
-> you should see one metricput sensor.status 1681147870 0 type=s10e device=s10e _field=val
sensor.status,device=s10e,type=s10e val=0 1681147680
You can execute this in any order of steps 2, 4 and 6. You can even leave step 4 out. If - on the other side - running 4 and 6 only, all works well -> you will see two metrics which are completely identical except the time stamps.
One can think - what a crazy idea to mix OpenTSDB and Influx line protocols. But I was unsure which one to use is the better idea. Using OpenTSDB would use consistent APIs for read and write. But when having a series of fields within a measurement, the Influx line protocol seems to be much more compact.
Note: I ran this using TT version 0.11.4 on ARM v71. I'll try the behaviour with 0.11.6 within the next couple of days.