Open ItielOlenick opened 1 month ago
Pinging code owners:
receiver/prometheus: @Aneurysm9 @dashpole
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I honestly can't see how this can be an issue with the prometheus receiver. The use of confighttp in #31452 is incredibly basic, and there's even a test verifying that the TLS config does in fact get read. So it's either a bug in confighttp, or the test methodology is wrong somehow. @ItielOlenick are you using the standalone collector for your tests? Or the Operator?
@swiatekm-sumo I'm using both.
The TLS cert is in fact read from the config. When testing with a cert that doesn't exist it complains about the file not existing. I think it is somehow ignored. Could this be due to the fact that it is used for servers, and in this case the TA is the server and the receiver is a client?
I've tested with both a standalone TA, operator one and a mock server serving the same endpoints. In all cases I used the same certs (both server and client). In all cases I am able to get a response using curl or openssl. From both withing the cluster and using port forwarding.
Tried everything I can think of.
It looks like it's loaded and added to the transport here: https://github.com/open-telemetry/opentelemetry-collector/blob/15faf6206bedc9d5a188f131fc5891904e29a34b/config/confighttp/confighttp.go#L122. Not sure what the problem could be. I'll try to do a minimal reproduction tomorrow using the otlp exporter.
It seems like the first call to /scrape_configs is OK:
info prometheusreceiver@v0.101.0/metrics_receiver.go:253 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "serviceMonitor/default/nginx-servicemonitor/0"}
but then the call to the /jobs endpoint is the issue:
error refresh/refresh.go:90 Unable to refresh target groups {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "http", "config": "serviceMonitor/default/nginx-servicemonitor/0", "err": "Get \"https://tls-debug-service:443/jobs/serviceMonitor%2Fdefault%2Fnginx-servicemonitor%2F0/targets?collector_id=collector-with-ta-collector-0\": remote error: tls: certificate required"}
github.com/prometheus/prometheus/discovery/refresh.(*Discovery).Run
github.com/prometheus/prometheus@v0.51.2-0.20240405174432-b4a973753c6e/discovery/refresh/refresh.go:90
It seems like the first call to /scrape_configs is OK:
info prometheusreceiver@v0.101.0/metrics_receiver.go:253 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "serviceMonitor/default/nginx-servicemonitor/0"}
but then the call to the /jobs endpoint is the issue:
error refresh/refresh.go:90 Unable to refresh target groups {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "http", "config": "serviceMonitor/default/nginx-servicemonitor/0", "err": "Get \"https://tls-debug-service:443/jobs/serviceMonitor%2Fdefault%2Fnginx-servicemonitor%2F0/targets?collector_id=collector-with-ta-collector-0\": remote error: tls: certificate required"} github.com/prometheus/prometheus/discovery/refresh.(*Discovery).Run github.com/prometheus/prometheus@v0.51.2-0.20240405174432-b4a973753c6e/discovery/refresh/refresh.go:90
That makes sense, the client config only applies to calls to /scrape_configs
. The calls to /jobs
are done by Prometheus' HTTP Service Discovery, and there we don't configure TLS.
In that case we have two simple solutions: we can either get targets over plain http even when TLS is enabled, or we can configure HTTP SD config to use TLS as well. The latter seems a bit hacky in that we probably should make an effort to configure it the same way as the confighttp client is configured, which may not be so obvious.
First one sounds easier but the second sounds more complete. Implementing the first suggestion would mean providing the regular non secure endpoint as well I assume. At any rate, now that we know the source of the issue I'll move forward with https://github.com/open-telemetry/opentelemetry-operator/issues/1669
@swiatekm-sumo looking over this again, couldn't we add the TA client tls config we provided to the httpSD?
httpSD.HTTPClientConfig.TLSConfig has a TLS config similar to what was implemented in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31449
Tested with
httpSD.HTTPClientConfig.TLSConfig.CAFile = allocConf.TLSSetting.CAFile
httpSD.HTTPClientConfig.TLSConfig.CertFile = allocConf.TLSSetting.CertFile
httpSD.HTTPClientConfig.TLSConfig.KeyFile = allocConf.TLSSetting.KeyFile
Works.
Yeah, that does work. My point was more that if we start applying settings from confighttp to httpSD, then we should try to do it as comprehensively as possible, or at least document that only the TLS parts carry over.
Yeah, that does work. My point was more that if we start applying settings from confighttp to httpSD, then we should try to do it as comprehensively as possible, or at least document that only the TLS parts carry over.
I'd like to work on this so we can move forward with https://github.com/open-telemetry/opentelemetry-operator/pull/3015
@dashpole @Aneurysm9 are you ok with this approach?
Can you just configure http_sd_config's TLS settings from the operator?
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config
@dashpole as is
httpSD.HTTPClientConfig.TLSConfig = allocConf.TLSSetting
?
If not I'm not sure how you intend the collector reading and applying this setting
Sorry, nevermind. I agree with https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33370#issuecomment-2149299108 that we should ideally translate as much of httpsd as possible.
Ok, makes sense. I'll draft something up.
Just to clarify, we are talking about using allocConf
's http related config in the httpSD.HTTPClientConfig
?
yes.
It seems like the only relevant fields for httpSD to be translated from the allocConf
's config are in fact TLSSetting
and maybe ProxyURL
Am i missing something?
Component(s)
receiver/prometheus
What happened?
Description
After https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31449 was merged it is expected that the client will use the TLS setting to communicate with the TA.
When adding TLS config to the config.yaml file, the client is not sending the tls certs in the requests to the Target Allocator.
In practice the TA server (or mock of) complains that the client does not provide certificates in the requests.
Steps to Reproduce
Set up a TA (Or mock) with mTLS - forcing the client to provide certificates.
Setup the collector with the tls key in the config:
collector.yaml
Expected Result
TLS certificates added to the request
Actual Result
Server complains that the client didn't provide a certificate
From the TA (or mock): TA:
From mock using open_ssl:
Collector version
0.101.0
Environment information
Environment
Kubernetes
OpenTelemetry Collector configuration
Log output
Additional context
No response