Closed chenxu1990 closed 4 years ago
Please share the logs of the Prometheus server. Anything relevant in the InfluxDB logs?
@simonpasquier Hi simon, there is no obvious error logs and the grafana chart is below As the picture shows, prometheus can not get the data after 20:00. If I restart it , the picture will be ok
Try running with --log.level=debug
. You can also take a look at the net_conntrack*{dialer_name="remote_storage"}
metrics.
@simonpasquier logs: pro_prometheus-front.1.2tw5pr40pqrz@rokid-ops-1.hz.rokid.com | level=info ts=2018-10-15T08:26:15.762344509Z caller=main.go:523 msg="Server is ready to receive web requests." pro_prometheus-front.1.2tw5pr40pqrz@rokid-ops-1.hz.rokid.com | level=debug ts=2018-10-15T08:26:15.762769295Z caller=manager.go:183 component="discovery manager notify" msg="discoverer exited" provider=string/0 pro_prometheus-front.1.2tw5pr40pqrz@rokid-ops-1.hz.rokid.com | level=info ts=2018-10-15T11:27:14.608373524Z caller=compact.go:398 component=tsdb msg="write block" mint=1539590400000 maxt=1539597600000 ulid=01CSVQNS4D6G07DPAJNW4VCJE4 pro_prometheus-front.1.2tw5pr40pqrz@rokid-ops-1.hz.rokid.com | level=info ts=2018-10-15T11:27:14.613941143Z caller=head.go:446 component=tsdb msg="head GC completed" duration=1.73509ms
net_conntrack*{dialer_name="remote_storage"} return no data
net_conntrack*{dialer_name="remote_storage"} return no data
try this {__name__=~"net_conntrack.+",dialer_name="remote_storage"}
instead.
@simonpasquier
Hi simon, I check the influxdb logs
172.18.0.12,172.16.68.221 - - [15/Oct/2018:21:30:52 +0800] "POST /query?db=prometheus&epoch=ms¶ms=%7B%7D&q=SELECT+value+FROM+%22autogen%22.%2F%5Enet_conntrack.%2B%24%2F+WHERE+%22dialer_name%22+%3D+%27remote_storage%27+AND+time+%3E%3D+1539566700000ms+AND+time+%3C%3D+1539604800000ms+GROUP+BY+%2A HTTP/1.1" 200 4285 "-" "InfluxDBClient" 8950769b-d07e-11e8-ba06-000000000000 22427
I query the data on 21:30:52 but prometheus filter the data before 20:00(1539604800000), there are some other same logs. The last query time stop at 20:00... @simonpasquier
You may need to tweak the read_recent
flag.
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
The parameter means to qeury data from remote storage each time ,but I don't have local storage .. What is the impact of this parameter, I am a bit confused ? Thanks : )
Which flags do you use to start Prometheus?
@simonpasquier
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
- '--log.level=debug'
but I don't have local storage
There's always local storage.
You means prometheus get data and cache them into memory. But when I refresh grafana, I always can see a read request to influxdb and the time of intercepting the data is not correct just like above log. @simonpasquier
Can you confirm that you use the native InfluxDB remote read endpoint?
Can you check that all clocks are synchronized?
Have you tried setting read_recent
to true?
When I say that there's always local storage, it means that Prometheus will always write the samples to its local storage even when remote write/read is used.
I set read_recent to true and the problem has been solved , thank you . But I am wondering why this problem does not occur when remote_read and remote_write are on one machine? @simonpasquier
Can you check that all clocks are synchronized?
All clocks are synchronized but they are in different time zones. One write prometheus is in UTC time zone and others in UTC + 8:00 timezone. @simonpasquier
It shouldn't matter for Prometheus as all times are converted to UTC. I can't say for InfluxDB.
It shouldn't matter for InfluxDB because writring to InfluxDB is totally OK . The error is that the time period for fetching data is incorrect when remote_read and remote_write are assigned to different machines.
I have the same problem.
Seeing this problem with prom v2.12.0 and v.2.11.2 with influxdb 1.6.6, There's nothing in the logs when this bad state occurs even with debug logging enabled.
i've set the following on the latest attempt to "fix it", should i up the "retention" it seems to fail right after the 6 hours is up?
"--storage.tsdb.path=/prometheus",
"--web.console.libraries=/usr/share/prometheus/console_libraries",
"--web.console.templates=/usr/share/prometheus/consoles",
"--storage.tsdb.allow-overlapping-blocks",
"--storage.tsdb.retention.time=6h",
"--storage.tsdb.no-lockfile",
"--storage.tsdb.retention.size=5GB",
"--query.max-samples=50000000",
"--query.max-concurrency=20",
"--query.timeout=2m",
"--query.lookback-delta=5m",
"--storage.remote.read-concurrent-limit=10",
"--storage.remote.read-sample-limit=5e7",
"--storage.remote.flush-deadline=5s",
"--web.max-connections=512",
"--web.read-timeout=5m",
"--log.level=debug
edit: I have not set read_recent=true this does not seem like an elegant solution, it sounds like this will cause writing to disk.
We've looked at this as part of our bug scrub, and this appears to be a support request that doesn't indicate any particular bug in Prometheus.
If you've further questions they'd be best asked on the prometheus-users mailing list
@brian-brazil influxdb 1.7.x is particularly sensitive to versions. I've found rolling all the way back to prometheus 2.4.3 is the most stable for influx/prom and their library in influxdb reflect that as well. They havent been updated in the 1.x line in some time.
Use 2.4.3 and below.
@tehlers320 I'm unable to fetch the measurements from InfluxDB to prometheus.
Prometheus config file:
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds. scrape_timeout: 15s # scrape_timeout is set to the global default (10s).
remote_read:
Can someone help me here or state the mistake I'm making here
add read_recent: true
and add &rp=autogen
or whatever your rp is to the end of the url. Influx at some point made this required on the api and didnt document it.
@tehlers320 I updated the configuration: global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds. scrape_timeout: 15s # scrape_timeout is set to the global default (10s).
remote_read:
Still it doesn't seem to work.
Dear @simonpasquier @brian-brazil @tehlers320 @chenxu1990 ,
Should I use some additional Prometheus exporter along with these configurations? Has anyone recently imported measurements in influxDB database to Prometheus using remote_read or by some other means. This info might be really helpful to me. Thanks in advance.
@ctmuthu were you able to find the resolution to this issue? i'm having the same problem. Thanks.
Dear @aqua-terra,
Use this configuration for InfuxDB
bind-address = ":8088" [meta] dir = "/var/lib/influxdb/meta" retention-autocreate = true logging-enabled = true
[data] dir = "/var/lib/influxdb/data" engine = "tsm1" wal-dir = "/var/lib/influxdb/wal" cache-max-memory-size = "4g"
[http] enabled = true bind-address = ":8086" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false max-row-limit = 10000 realm = "InfluxDB"
[retention] enabled = true check-interval = "30m"
[subsciber] enabled = true http-timeout = "30s"
[continuous_queries] log-enabled = true enabled = true
It should work fine.
On Tue, Jun 2, 2020 at 8:08 PM aqua-terra notifications@github.com wrote:
@ctmuthu https://github.com/ctmuthu were you able to find the resolution to this issue? i'm having the same problem. Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/4739#issuecomment-637715735, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKESM3ZW3KRGTKS5YRND3LRUU5YPANCNFSM4F3OCWRQ .
-- Thanks & Regards, Muthuraman Chidambaram Contact : +49 1625 93 99 01
@ctmuthu still doesn't work for me. i'm seeing the read request coming through in influxdb but nothing was returned back to prometheus. I'm using grafana to read from prometheus data source. Can you share your remote read configuration if it has changes since the last time you post it? Thanks.
[httpd] 10.3.98.160 - admin [05/Jun/2020:16:25:01 +0000] "POST /api/v1/prom/read?db=prometheus HTTP/1.1" 200 4 "-" "Prometheus/2.18.1" 1ad1911f-a749-11ea-84ce-2265cdc67e1e 148
here's my influxdb config:
reporting-disabled = false bind-address = ":8088"
[meta] dir = "/var/lib/influxdb/meta" retention-autocreate = true logging-enabled = true
[data] dir = "/var/lib/influxdb/data" index-version = "inmem" wal-dir = "/var/lib/influxdb/wal" wal-fsync-delay = "0s" validate-keys = false query-log-enabled = true cache-max-memory-size = 1073741824 cache-snapshot-memory-size = 26214400 cache-snapshot-write-cold-duration = "10m0s" compact-full-write-cold-duration = "4h0m0s" compact-throughput = 50331648 compact-throughput-burst = 50331648 max-series-per-database = 1000000 max-values-per-tag = 100000 max-concurrent-compactions = 0 max-index-log-file-size = 1048576 series-id-set-cache-size = 100 trace-logging-enabled = false tsm-use-madv-willneed = false
[coordinator] write-timeout = "10s" max-concurrent-queries = 0 query-timeout = "0s" log-queries-after = "0s" max-select-point = 0 max-select-series = 0 max-select-buckets = 0
[retention] enabled = true check-interval = "30m0s"
[shard-precreation] enabled = true check-interval = "10m0s" advance-period = "30m0s"
[monitor] store-enabled = true store-database = "_internal" store-interval = "10s"
[subscriber] enabled = true http-timeout = "30s" insecure-skip-verify = false ca-certs = "" write-concurrency = 40 write-buffer-size = 1000
[http] enabled = true bind-address = ":8086" auth-enabled = false log-enabled = true suppress-write-log = false write-tracing = false flux-enabled = false flux-log-enabled = false pprof-enabled = false pprof-auth-enabled = false debug-pprof-enabled = false ping-auth-enabled = false https-enabled = false https-certificate = "/etc/ssl/influxdb.pem" https-private-key = "" max-row-limit = 10000 max-connection-limit = 0 shared-secret = "" realm = "InfluxDB" unix-socket-enabled = false unix-socket-permissions = "0777" bind-socket = "/var/run/influxdb.sock" max-body-size = 25000000 access-log-path = "" max-concurrent-write-limit = 0 max-enqueued-write-limit = 0 enqueued-write-timeout = 30000000000
[logging] format = "auto" level = "info" suppress-logo = false
[[graphite]] enabled = false bind-address = ":2003" database = "graphite" retention-policy = "" protocol = "tcp" batch-size = 5000 batch-pending = 10 batch-timeout = "1s" consistency-level = "one" separator = "." udp-read-buffer = 0
[[collectd]] enabled = false bind-address = ":25826" database = "collectd" retention-policy = "" batch-size = 5000 batch-pending = 10 batch-timeout = "10s" read-buffer = 0 typesdb = "/usr/share/collectd/types.db" security-level = "none" auth-file = "/etc/collectd/auth_file" parse-multivalue-plugin = "split"
[[opentsdb]] enabled = false bind-address = ":4242" database = "opentsdb" retention-policy = "" consistency-level = "one" tls-enabled = false certificate = "/etc/ssl/influxdb.pem" batch-size = 1000 batch-pending = 5 batch-timeout = "1s" log-point-errors = true
[[udp]] enabled = false bind-address = ":8089" database = "udp" retention-policy = "" batch-size = 5000 batch-pending = 10 read-buffer = 0 batch-timeout = "1s" precision = ""
[continuous_queries] log-enabled = true enabled = true query-stats-enabled = false run-interval = "1s"
[tls] min-version = "" max-version = ""
Seeing some of the same issues here - how do you specify the specific measurement table to utilize within the database you supply for prometheus to read?
@Kampe I gave up on this issue and ended up just using InfluxDB data source directly in Grafana instead of going through prometheus remote read.
Dear All,
Prometheus Config:
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds. scrape_timeout: 15s # scrape_timeout is set to the global default (10s). remote_read:
(Please use proper indendation)
Influxdb Configuration:
bind-address = ":8088" [meta] dir = "/var/lib/influxdb/meta" retention-autocreate = true logging-enabled = true
[data] dir = "/var/lib/influxdb/data" engine = "tsm1" wal-dir = "/var/lib/influxdb/wal" cache-max-memory-size = "4g" max-series-per-database = 0
[http] enabled = true bind-address = ":8086" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false max-row-limit = 10000 realm = "InfluxDB"
[retention] enabled = true check-interval = "30m"
[subsciber] enabled = true http-timeout = "30s"
[continuous_queries] log-enabled = true enabled = true
I'm using this config for the last couple of months. I had no trouble. Try this config and update here.
On Thu, Aug 27, 2020 at 7:19 PM aqua-terra notifications@github.com wrote:
@Kampe https://github.com/Kampe I gave up on this issue and ended up just using InfluxDB data source directly in Grafana instead of going through prometheus remote read.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/4739#issuecomment-682082654, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKESMYOXPAXWL5Z2IQ367DSC2IQJANCNFSM4F3OCWRQ .
-- Thanks & Regards, Muthuraman Chidambaram Contact : +49 1625 93 99 01
Hello all,
If nothing works. Then I can debug with you together over the weekend.
On Thu, Aug 27, 2020 at 10:24 PM Muthuraman Chidambaram ctmuthu93@gmail.com wrote:
Dear All,
Prometheus Config:
Global config
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds. scrape_timeout: 15s # scrape_timeout is set to the global default (10s). remote_read:
- url: http://xxx.xxx.xxx.xxx:8086/api/v1/prom/read?db=prometheus remote_timeout: 1m
(Please use proper indendation)
Influxdb Configuration:
bind-address = ":8088" [meta] dir = "/var/lib/influxdb/meta" retention-autocreate = true logging-enabled = true
[data] dir = "/var/lib/influxdb/data" engine = "tsm1" wal-dir = "/var/lib/influxdb/wal" cache-max-memory-size = "4g" max-series-per-database = 0
[http] enabled = true bind-address = ":8086" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false max-row-limit = 10000 realm = "InfluxDB"
[retention] enabled = true check-interval = "30m"
[subsciber] enabled = true http-timeout = "30s"
[continuous_queries] log-enabled = true enabled = true
I'm using this config for the last couple of months. I had no trouble. Try this config and update here.
On Thu, Aug 27, 2020 at 7:19 PM aqua-terra notifications@github.com wrote:
@Kampe https://github.com/Kampe I gave up on this issue and ended up just using InfluxDB data source directly in Grafana instead of going through prometheus remote read.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/4739#issuecomment-682082654, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKESMYOXPAXWL5Z2IQ367DSC2IQJANCNFSM4F3OCWRQ .
-- Thanks & Regards, Muthuraman Chidambaram Contact : +49 1625 93 99 01
-- Thanks & Regards, Muthuraman Chidambaram Contact : +49 1625 93 99 01
Proposal
Use case. Why is this important?
“Nice to have” is not a good use case. :)
Bug Report
What did you do? Two prometheus write data to influxdb. An other prometheus read data from influxdb by influxdb's api and grafana generates charts. What did you expect to see? Prometheus can get data from influxdb.
What did you see instead? Under which circumstances? It worked properly and after some hours it can not get newly added data. If you restart the read prometheus , it will be ok again.
Environment centos 7
System information: Linux 3.10.0-693.2.2.el7.x86_64 x86_64 insert output of
uname -srm
herePrometheus version: 2.4.3 insert output of
prometheus --version
hereAlertmanager version:
insert output of
alertmanager --version
here (if relevant to the issue)Prometheus configuration file:
Alertmanager configuration
alerting: alertmanagers:
- alertmanager:9093
Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "second_rules.yml"
A scrape configuration containing exactly one endpoint to scrape:
Here it's Prometheus itself.
scrape_configs:
The job name is added as a label
job=<job_name>
to any timeseries scraped from this config.- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
scrape_configs:
Override the global default and scrape targets from this job every 5 seconds.
- job_name: 'pushgateway'
scrape_interval: 30s
static_configs:
- targets: ['172.16.68.221:9091']
labels:
group: 'pushgateway'
- job_name: 'ecs_group'
scrape_interval: 30s
file_sd_configs:
- refresh_interval: 1m
files:
- ./conf.d/*.json
- job_name: 'speech_gw'
metrics_path: '/debug/metrics'
scrape_interval: 1m
file_sd_configs:
- refresh_interval: 1m
files:
- ./conf.d/sc-speech-gw.yml
#
remote_write:
- url: "http://influxdb_adapter:9201/write"
remote_read:
Alertmanager configuration file:
Logs: