ytyou / ticktock

TickTockDB is an OpenTSDB-like time series database, with much better performance.
GNU General Public License v3.0
72 stars 8 forks source link

very weird value reading #31

Closed mzy2240 closed 1 year ago

mzy2240 commented 1 year ago

I have spend the last two hours trying to figure out why the data inside ticktock does not match the value inside home assistant. Two differences:

  1. Somehow some reading automatically rounded to integer, instead of float, even the payload is something like put cpu_temperature 1674275610.127971 120.9. At beginning I thought it is a bug in home assistant, but after checking the log I am sure the payload contains float.
  2. Some float metrics seem to be constant for a very long time, even if I was using random generator to generate different values as payload each time. I also verified in the log that the random value are really "random", not constant.
mzy2240 commented 1 year ago

I am reading the db data using grafana by the way. both chart and table show the same value. no downsampling.

mzy2240 commented 1 year ago

Screen Shot 2023-01-20 at 10 43 02 PM

mzy2240 commented 1 year ago

Screen Shot 2023-01-20 at 10 44 30 PM

mzy2240 commented 1 year ago

In the home assistant, the action to save value to db is set to only trigger when the value (like cpu temperature, memory usage) changes, but you can easily see that most of the data are constant. Does not make too much sense to me.

mzy2240 commented 1 year ago

Both shell command and restful command have the same issue. Is it caused by the float timestamp?

mzy2240 commented 1 year ago

Both shell command and restful command have the same issue. Is it caused by the float timestamp?

Nope. Use timestamp as integer does not help. Adding tag does not help as well.

ylin30 commented 1 year ago

I never used home-assistant before. Is it a IoT device management system?

  1. Somehow some reading automatically rounded to integer, instead of float, even the payload is something like put cpu_temperature 1674275610.127971 120.9. At beginning I thought it is a bug in home assistant, but after checking the log I am sure the payload contains float.

Based on the graphs you post, at least some data points have float values (e.g., memory_usage: 16.4). So TT does support float.

  1. Some float metrics seem to be constant for a very long time, even if I was using random generator to generate different values as payload each time. I also verified in the log that the random value are really "random", not constant.

I think the best way to debug is to increase TT log level to HTTP by calling './admin/log_level.sh HTTP'. It will log HTTP message bodies in TT logs. Then you can check if there are data points with a floating value in input message, but stored as long/int. If yes, please copy the input message to me and let me repro it.

BTW Happy Chinese new year!

ylin30 commented 1 year ago

I am not able to repro it by using the payload above put cpu_temperature 1674275610.127971 120.9

ylin30@yi-IdeaPad:~$ curl -v -XPOST 'http://localhost:6182/api/put' -d 'put cpu_temperature 1674275610.127971 120.9'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:6182...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 6182 (#0)
> POST /api/put HTTP/1.1
> Host: localhost:6182
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Length: 43
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 43 out of 43 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
ylin30@yi-IdeaPad:~$ curl 'http://localhost:6182/api/query?start=1674275510.127971&m=avg:cpu_temperature'
[{"metric":"cpu_temperature","tags":{},"aggregateTags":[],"dps":{"1674275610":120.9000000000000057}}]
mzy2240 commented 1 year ago

Screen Shot 2023-01-21 at 10 49 36 AM The db indeed receive random float values (as can be seen from the gg2 metric log), but grafana shows a flat curve.

mzy2240 commented 1 year ago

BTW Happy Chinese new year!

Happy lunar new year!

ylin30 commented 1 year ago

Would you mind to share the whole ticktock.log with me? The snapshot doesn't show the whole history. For example, put gg2 1674319650 0.1412 was at 8:47am. But the graph shows gg2 during 10:20-10:45.

ylin30@yi-IdeaPad:~$ date -d @1674319650
Sat 21 Jan 2023 08:47:30 AM PST
mzy2240 commented 1 year ago

ticktock.log Here it is.

mzy2240 commented 1 year ago

From grafana, it shows cpu_temperature as integer only, and gg2 as constant float (0.903, the first value it received I guess).

mzy2240 commented 1 year ago

put gg2 1674319650 0.1412 was at 8:47am. But the graph shows gg2 during 10:20-10:45

That is because the local time is cst. The log and the dashboard are showing the same data.

ylin30 commented 1 year ago

Let me repro it. I will get back to you ASAP.

ylin30 commented 1 year ago

I can repro it now. Let me debug it.

ylin30 commented 1 year ago

@mzy2240 There is a bug in V3 compressor. What you can do right now is to use V2 compressor by updating tt.conf:

tsdb.compressor.version = 2

Its default value is 3. You will have to remove previous data (if possible) and restart TT. I just tested it and verified the data are consistent.

image

Sorry for the inconvenience. And thanks for the bug report.

mzy2240 commented 1 year ago

Thanks! Good to know! By the way, is there an option to bind http address to 0.0.0.0 instead of localhost only? I tried to connect in LAN but cannot make it last night, not sure whether the IP address binding is the issue.

ylin30 commented 1 year ago

Please check your /etc/hosts setup. 0.0.0.0 should also work.

ylin30@yi-IdeaPad:~$ curl -XPOST 'http://localhost:6182/api/admin?cmd=ping'
pongylin30@yi-IdeaPad:~$
ylin30@yi-IdeaPad:~$
ylin30@yi-IdeaPad:~$ curl -XPOST 'http://0.0.0.0:6182/api/admin?cmd=ping'
pong
mzy2240 commented 1 year ago

I will leave it here for you to close. Version 2 compressor works just fine.

Soren-klars commented 1 year ago

I have the same issue now with version 0.10.2, showing wrong values (returning the same value, always...) for 2 tags out of the 4 I have. The values come via MQTT and I can see what is written into the DB is not what Grafana is showing, see screenshot. The only one difference I can see is the name of the tag where this issue occurs, i.e. it happens in 'kwh-last-10-min' but not in 'current-power'. Grafana-chart-showing-wrong-values

ylin30 commented 1 year ago

@Soren-klars Thanks for reporting. Did you use tsdb.compressor.version = 2 or the default tsdb.compressor.version = 3? V3 compressor has a bug in 0.10.2 and we are testing a hotfix right now. The problem is in uncompress path during reads. You can use V2 compressor in 0.10.2. We manually tested V2 and it looks still good.

If it is 0.10.2 with V2 compressor, would you please do the following to help us repro?

  1. enable HTTP log level by calling ./admin/log_level.sh HTTP (or just update tt.conf).
  2. send us ticktock.log.
Soren-klars commented 1 year ago

Hi @ylin30, yes I did use the tsdb.compressor.version = 2 setting from the beginning. And I surely re-started the DB to apply the setting. And I attached the requested http-level log file... ticktock.log

ylin30 commented 1 year ago

@Soren-klars I looked at your log. The query with tags (direction=produced,type=kwh-last-10-min) did return just two values for many data points. But I couldn't find the original input payloads to verify what were the values during writes. I only saw 4 writes in the log, and only 2 are (direction=produced,type=kwh-last-10-min). It is not enough.

[Yi-MBP Downloads]$ grep "put ener" ticktock_kwh-last-10min.log
put energy.kitchen 1674423411 0 direction=produced type=current-power
put energy.kitchen 1674423412 0 direction=produced type=kwh-last-10-min
put energy.kitchen 1674423442 587.0 direction=consumed type=current-power
put energy.kitchen 1674423443 0.1 direction=consumed type=kwh-last-10-min

So I created a list of data points with random values (see attached file). I inserted the data points into TT (with V2 compressor).

ylin30@yi-IdeaPad:~$ curl -v -XPOST 'http://localhost:6182/api/put' --data-binary @/tmp/energy.kitchen.txt
....
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
ylin30@yi-IdeaPad:~$

Then I queried the data points with different tag combinations. The query results look consistent with the input file.

ylin30@yi-IdeaPad:~$ curl -v 'http://localhost:6182/api/query?start=1674345600&m=avg:1m-avg:energy.kitchen\{direction=produced,type=kwh-last-10-min\}'
...
* Connected to localhost (127.0.0.1) port 6182 (#0)
> GET /api/query?start=1674345600&m=avg:1m-avg:energy.kitchen{direction=produced,type=kwh-last-10-min} HTTP/1.1
...
* Connection #0 to host localhost left intact
[{"metric":"energy.kitchen","tags":{"direction":"produced","type":"kwh-last-10-min"},"aggregateTags":[],"dps":{"1674423360":0.0,"1674423420":0.25,"1674423480":0.42,"1674423540":0.503,"1674423600":0.55,"1674423660":0.49,"1674423720":0.343,"1674423780":0.5145000000000001}}]
ylin30@yi-IdeaPad:~$ curl -v 'http://localhost:6182/api/query?start=1674345600&m=:energy.kitchen'
....
> GET /api/query?start=1674345600&m=:energy.kitchen HTTP/1.1
...
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 1143
<
[{"metric":"energy.kitchen","tags":{"direction":"produced","type":"kwh-last-10-min"},"aggregateTags":[],"dps":{"1674423412":0.0,"1674423472":0.25,"1674423512":0.42,"1674423572":0.503,"1674423612":0.55,"1674423672":0.19,"1674423712":0.79,"1674423772":0.343,"1674423812":0.343}},{"metric":"energy.kitchen","tags":{"direction":"consumed","type":"current-power"},"aggregateTags":[],"dps":{"1674423442":587.0,"1674423472":594.0,"1674423542":527.0,"1674423572":489.0,"1674423642":532.0,"1674423672":556.0,"1674423742":612.0,"1674423772":597.0,"1674423842":582.0,"1674423872":621.0}},{"metric":"energy.kitchen","tags":{"direction":"produced","type":"current-power"},"aggregateTags":[],"dps":{"1674423411":0.0,"1674423471":0.34,"1674423511":0.8,"1674423571":0.207,"1674423611":0.302,"1674423671":0.28,"1674423711":0.5,"1674423771":0.79,"1674423811":0.82}},{"metric":"energy.kitchen","tags":{"direction":"consumed","type":"kwh-last-10-min"},"aggregateTags":[],"dps":{"1674423443":0.1,"1674423473":0.37,"1674423543":0.25,"1674423573":* Connection #0 to host localhost left intact
0.607,"1674423643":0.17,"1674423673":0.08,"1674423743":0.18,"1674423773":0.885,"1674423843":0.431,"1674423873":0.197}}]ylin30@yi-IdeaPad:~$

Could you please try my steps (put and then query) to verify? Or you can let your program run for some time (with quite a few input commands in the log) and send a query to repro the problem. Then send me the log.

Thanks. energy.kitchen.TXT

Soren-klars commented 1 year ago

Hi @ylin30, no I could not reproduce it today, Grafana is showing the correct values. My guess is that the first time I started up TT yesterday must have bin without the V2 compressor setting, and only on the second start I had set it. So I assume the data files were already created in the wrong way. Could this be the case? If so this issue could be closed for now.

And another little question, is there already a bulk import feature implemented so that I can easier import the data from older versions? I created json backup files, which I could loop through and make a single request for each value, but it would be great if...

ylin30 commented 1 year ago

Hi @ylin30, no I could not reproduce it today, Grafana is showing the correct values. My guess is that the first time I started up TT yesterday must have bin without the V2 compressor setting, and only on the second start I had set it. So I assume the data files were already created in the wrong way. Could this be the case? If so this issue could be closed for now.

Very likely it was mistakenly with V3 as the symptom looks familiar.

And another little question, is there already a bulk import feature implemented so that I can easier import the data from older versions? I created json backup files, which I could loop through and make a single request for each value, but it would be great if...

Oh my goodness, it is exactly what we are busy with in last week. WE are implementing the Influxdb line protocol for bulk importing. So in the future you can use Telegraf (influxdb default collector) to bulk import data. It is almost done and being tested right now. Please keep watching. It may be ready in this week.

ylin30 commented 1 year ago

V3 compressor has been fixed in version 0.10.4. InfluxDB line protocol has been supported since 0.11.0 (but please use 0.11.1 instead since there is a bug in 0.11.0).

I closed this issue. Version 0.11.1 is the latest version to use.