Closed LeoDemon closed 5 months ago
The bugs in earlier versions of 3.0 have been fixed, but 3.0.2.5 May not be compatible with the current version (3.3.0.0), so it is not possible to upgrade directly. Therefore, it is recommended to redeploy the latest 3.3.0.0, the subsequent upgrade is very convenient, directly install the latest version of the software.
Range of open source version of the operating system support: https://docs.taosdata.com/reference/support-platform/ The bugs from earlier versions of 3.0 have been fixed, but 3.0.2.5 May not be compatible with the current version (3.3.0.0), so it is not possible to upgrade directly. Therefore, it is recommended to redeploy the latest 3.3.0.0, the subsequent upgrade is very convenient, directly install the latest version of the software.
Range of open source version of the operating system support: https://docs.taosdata.com/reference/support-platform/
Bug Description
Prometheus remote write error after insert 1 million records and high machine usage(cpu/memory) of taosd/taosadapter process.
To Reproduce
storage.tsdb.retention.time=1d
.remote_read:
Server is Community Edition.
taos> show databases; name |
information_schema | performance_schema | demo_dev | prometheus_data | Query OK, 4 rows in database (0.001345s)
taos> use prometheus_data; Database changed.
taos> show stables; stable_name |
metrics | Query OK, 1 rows in database (0.001225s)
taos> select count() from metrics; count() |
Query OK, 1 rows in database (0.081566s)
taos>
... ... level=info ts=2022-09-22T08:24:31.912Z caller=main.go:818 msg="TSDB started" level=info ts=2022-09-22T08:24:31.912Z caller=main.go:944 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml ts=2022-09-22T08:24:31.913Z caller=dedupe.go:112 component=remote level=info remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Starting WAL watcher" queue=ad36f9 ts=2022-09-22T08:24:31.913Z caller=dedupe.go:112 component=remote level=info remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Starting scraped metadata watcher" ts=2022-09-22T08:24:31.913Z caller=dedupe.go:112 component=remote level=info remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Replaying WAL" queue=ad36f9 level=info ts=2022-09-22T08:24:31.924Z caller=main.go:975 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=12.841996ms remote_storage=553.029µs web_handler=440ns query_engine=744ns scrape=259.402µs scrape_sd=125.461µs notify=24.508µs notify_sd=14.147µs rules=10.946317ms level=info ts=2022-09-22T08:24:31.924Z caller=main.go:767 msg="Server is ready to receive web requests." ts=2022-09-22T08:24:36.997Z caller=dedupe.go:112 component=remote level=info remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Done replaying WAL" duration=5.083726968s level=info ts=2022-09-22T09:00:01.069Z caller=compact.go:507 component=tsdb msg="write block" mint=1663826400000 maxt=1663833600000 ulid=01GDJ6SRGFZB9Y769JBP371XXN duration=158.068357ms level=info ts=2022-09-22T09:00:01.076Z caller=db.go:1230 component=tsdb msg="Deleting obsolete block" block=01GDFV8SV6K0YX9AW4GGKR0CBQ level=info ts=2022-09-22T09:00:01.084Z caller=head.go:880 component=tsdb msg="Head GC completed" duration=8.251081ms level=info ts=2022-09-22T09:00:01.097Z caller=checkpoint.go:95 component=tsdb msg="Creating checkpoint" from_segment=282 to_segment=286 mint=1663833600000 level=info ts=2022-09-22T09:00:01.201Z caller=head.go:977 component=tsdb msg="WAL checkpoint complete" first=282 last=286 duration=103.818061ms ts=2022-09-22T09:06:23.119Z caller=dedupe.go:112 component=remote level=warn remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Failed to send batch, retrying" err="Post \"http://hostA:6041/prometheus/v1/remote_write/prometheus_data\": context deadline exceeded" ts=2022-09-22T09:07:23.282Z caller=dedupe.go:112 component=remote level=warn remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Failed to send batch, retrying" err="Post \"http://hostA:6041/prometheus/v1/remote_write/prometheus_data\": context deadline exceeded" ts=2022-09-22T09:08:23.484Z caller=dedupe.go:112 component=remote level=warn remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Failed to send batch, retrying" err="Post \"http://hostA:6041/prometheus/v1/remote_write/prometheus_data\": context deadline exceeded" ts=2022-09-22T09:09:23.686Z caller=dedupe.go:112 component=remote level=warn remote_name=ad36f9 url=http://hostA:6041/prometheus/v1/remote_write/prometheus_data msg="Failed to send batch, retrying" err="Post \"http://hostA:6041/prometheus/v1/remote_write/prometheus_data\": context deadline exceeded" .... .... ....
taosadapter error logs:
.... .... 09/22 17:05:20.514168 00032544 TAOS_ADAPTER info "| 202 | 294.24148ms | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " sessionID=16988 model=web 09/22 17:05:20.801576 00032544 TAOS_ADAPTER info "| 202 | 286.039756ms | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=16989 09/22 17:05:32.068881 00032544 TAOS_ADAPTER error "
^[[31m2022/09/22 17:05:32 write tcp 172.16.0.217:6041->172.16.0.171:32802: write: broken pipe POST /prometheus/v1/remote_read/prometheus_data HTTP/1.1^M Host: hostA:6041^M Accept-Encoding: snappy^M Authorization: ^M Content-Encoding: snappy^M Content-Length: 165^M Content-Type: application/x-protobuf^M User-Agent: Prometheus/2.26.0^M X-Prometheus-Remote-Read-Version: 0.1.0^M ^M ^[[0m " model=web id=16990 09/22 17:05:32.070565 00032544 TAOS_ADAPTER info "| 202 | 11.26762798s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=16990 09/22 17:05:32.071968 00032544 TAOS_ADAPTER info "| 202 | 410.972µs | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=16996 09/22 17:05:38.092067 00032544 TAOS_ADAPTER info "| 202 | 16.381650639s | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=16991 ... ... 09/22 17:35:31.917105 00032544 TAOS_ADAPTER info "| 202 | 302.912µs | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=18421 09/22 17:36:31.917862 00032544 TAOS_ADAPTER info "| 202 | 337.44µs | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=18466 09/22 17:37:31.916853 00032544 TAOS_ADAPTER info "| 202 | 238.442µs | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=18512 09/22 17:38:31.108564 00032544 TAOS_ADAPTER error "select metrics.,tbname from metrics where ts >= '2022-09-22T08:56:01.175Z' and ts <= '2022-09-22T09:06:01.175Z' and labels->'instance' match '$server:.*' and labels->'name' = 'node_load1' and labels->'cluster' = 'executor-server-test'" model=prometheus error=[0x2501] scheduler status error 09/22 17:38:31.917356 00032544 TAOS_ADAPTER info "| 202 | 555.607µs | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=18557 09/22 17:38:37.282855 00032544 TAOS_ADAPTER error "
^[[31m2022/09/22 17:38:37 write tcp 172.16.0.217:6041->172.16.0.171:34974: write: broken pipe POST /prometheus/v1/remote_read/prometheus_data HTTP/1.1^M Host: hostA:6041^M Accept-Encoding: snappy^M Authorization: *^M Content-Encoding: snappy^M Content-Length: 143^M Content-Type: application/x-protobuf^M User-Agent: Prometheus/2.26.0^M X-Prometheus-Remote-Read-Version: 0.1.0^M ^M ^[[0m " model=web id=17067 09/22 17:38:37.282896 00032544 TAOS_ADAPTER info "| 202 | 32m44.712724231s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=17067 09/22 17:38:38.075700 00032544 TAOS_ADAPTER info "| 202 | 31m44.892738996s | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=17117 09/22 17:38:38.076164 00032544 TAOS_ADAPTER error "
^[[31m2022/09/22 17:38:38 write tcp 172.16.0.217:6041->172.16.0.171:34964: write: broken pipe POST /prometheus/v1/remote_read/prometheus_data HTTP/1.1^M Host: hostA:6041^M Accept-Encoding: snappy^M Authorization: ^M Content-Encoding: snappy^M Content-Length: 142^M Content-Type: application/x-protobuf^M User-Agent: Prometheus/2.26.0^M X-Prometheus-Remote-Read-Version: 0.1.0^M ^M ... ... 09/22 17:40:59.072895 00032544 TAOS_ADAPTER info "| 202 | 25.977438038s | 172.16.0.171 | POST | /prometheus/v1/remote_write/prometheus_data " model=web sessionID=18653 09/22 17:42:51.083694 00032544 TAOS_ADAPTER error "select metrics.,tbname from metrics where ts >= '2022-09-22T03:07:48.518Z' and ts <= '2022-09-22T09:07:48.518Z' and labels->'fstype' != '' and labels->'job' = 'node-exporter' and labels->'name' = 'node_filesystem_readonly' and labels->'cluster' = 'executor-server-test'" model=prometheus error=[0x2501] scheduler status error 09/22 17:42:51.083788 00032544 TAOS_ADAPTER info "| 400 | 30m17.933627094s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=17375 09/22 17:42:51.084049 00032544 TAOS_ADAPTER error "select metrics.,tbname from metrics where ts >= '2022-09-22T09:03:18.808Z' and ts <= '2022-09-22T09:08:18.808Z' and labels->'name' != '' and labels->'name' = 'container_spec_memory_limit_bytes' and labels->'cluster' = 'executor-server-test'" model=prometheus error=[0x2501] scheduler status error 09/22 17:42:52.064444 00032544 TAOS_ADAPTER info "| 400 | 33m49.474018172s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=17213 09/22 17:58:07.105048 00032544 TAOS_ADAPTER error "select metrics.,tbname from metrics where ts >= '2022-09-22T09:02:54.146Z' and ts <= '2022-09-22T09:07:54.146Z' and labels->'name' = 'up' and labels->'cluster' = 'executor-server-test'" model=prometheus error=[0x2501] scheduler status error 09/22 17:58:07.105135 00032544 TAOS_ADAPTER info "| 400 | 50m2.939550977s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=17170 09/22 17:58:07.105340 00032544 TAOS_ADAPTER error "select metrics.*,tbname from metrics where ts >= '2022-09-22T09:07:18.808Z' and ts <= '2022-09-22T09:12:18.808Z' and labels->'name' != '' and labels->'name' = 'container_cpu_cfs_throttled_periods_total' and labels->'cluster' = 'executor-server-test'" model=prometheus error=[0x2501] scheduler status error 09/22 17:58:08.067337 00032544 TAOS_ADAPTER info "| 400 | 44m55.448291031s | 172.16.0.171 | POST | /prometheus/v1/remote_read/prometheus_data " model=web sessionID=17403 ... ... ...
Additional Context 2022-09-22 logs of taosd not found, I don't know why.