Open winhvu opened 4 months ago
When I limit the number of the concurrencies to 1 by setting query.max-concurrency=1
, there is no more tls error message.
and there is no issue at all if we send queries in parallel directly to reverse proxy, bypass the promxy
pod.
I have tried with some another setups to narrow down the scope that could cause the problem:
1) Use static server group rather than the dynamic one to confirm if the issue would cause by target discovery or not.
2) Use one target server rather than 02 from the static group to check if there would have race condition while Promxy deals with multiple targets.
3) Add more time to timeout
and dial_timeout
to see if the default times would be too short that Promxy might terminate the connection while tls handshake is not yet done.
But TLS error messages still show up in the logs.
@jacksontj Do you have any feedback/comments on this issue? Do you think there is race condition there in Promxy?
First off, thanks for reaching out!
I did some initial digging but your configuration seems incomplete (maybe just not included in the issue?). Specifically its missing the scheme
configuration which would make all the requests downstream from promxy be http
instead of https
.
So in my local testing I have promxy
-> nginx
(with TLS) -> `demo.robustperception.io:9090
And I was able to get data working correctly and use a variation on your curl to test parallel usage:
seq 1 200 | xargs -n1 -P10 curl -k "https://localhost:8082/api/v1/query?query=up"
I have used promxy in front of HTTPs downstreams before without issue; so I don't expect you'll run into issues (other than the config; which is a bit odd because the prometheus scrape_config is a bit odd).
Hopefully that helps?
Thanks @jacksontj for the reply.
Yes, we do have scheme
in the promxy configuration:
- job_name: 'prometheus-pods'
# anti-affinity for merging values in timeseries between hosts in the server_group
anti_affinity: 15s
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- testing-ns
# configures the protocol scheme used for requests. Defaults to http
scheme: https
# options for promxy's HTTP client when talking to hosts in server_groups
http_client:
# dial_timeout controls how long promxy will wait for a connection to the downstream
dial_timeout: 10s
tls_config:
ca_file: /run/secrets/trusted-root-cert/ca.crt
cert_file: /run/secrets/prometheus-client-cert/tls.crt
key_file: /run/secrets/prometheus-client-cert/tls.key
insecure_skip_verify: false
relabel_configs: []
The scheme http
displayed in the log is misleading. However, I have enabled promxy log with trace
level to see what the scheme and Prometheus endpoints promxy communicate with, and it is totally correct.
I have used promxy in front of HTTPs downstreams before without issue
The issue is not always showed up if the traffics towards promxy is low; it happens more frequently if we add more traffics like running the same curl
command above from multiple terminals (e.g. I ran on 03 terminals in parallel)
Hi @jacksontj
Do you have a chance to reproducing the issue using the way I mentioned above?
We have a deployment as below:
and here is http_client we passed to Promxy:
When I perform multiple PromQL queries in parallel towards Promxy via curl command like this:
seq 1 200 | xargs -n1 -P10 curl --cert tls.crt --key tls.key --cacert ca.crt "https://promxy-endpoint:9091/api/v1/query?query=up"
I got lots of tls error messages
http: TLS handshake error from <promxy IP>:port: EOF
in our reverse proxy. It does not happen when the queries are sent in sequence.I have decoded certs of both sides, client and server, they are all valid certificates.
Checking Promxy logs, there is no error messages; logs shows queries with returned successful code (200).
I would like to know if Promxy supports queries in parallel?
I have tried to test with
query.max-concurrency=20
and the default onequery.max-concurrency=-1
; it does not help.