Closed jinxin-fu closed 2 years ago
Maybe I should use , change "request-metrics-backend-destination" to "metrics.request-metrics-backend-destination"
kubectl patch -n knative-serving cm config-observability --type merge --patch '{"data":{"metrics.backend-destination":"opencensus","metrics.request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'
to config the config-observability but still get nothing from the opentelemetry-collector
The queue-proxy prints logs as follow:
{"level":"info","ts":1629273188.5869856,"logger":"fallback","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter."}
{"level":"info","ts":1629273188.590455,"logger":"fallback","caller":"metrics/opencensus_exporter.go:56","msg":"Created OpenCensus exporter with config:","config":{}}
{"level":"info","ts":1629273188.5905085,"logger":"fallback","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/internal/serving revision opencensus 60000000000 <nil> <nil> otel-collector.metrics:55678 false 0 false { false}}"}
/assign @evankanderson Hi , I am a software developer using Knative to construct a serverless-platform . I have installed Knative 1.23.0 successfully , now I need to use Prometheus or Thanos to monitor the whole system as well as the user-application(metrics from queue-proxy). I wonder if the key in the patch for config-observability should be "request-metrics-backend-destination" or "metrics.request-metrics-backend-destination". I use metrics.request-metrics-backend-destination to patch the configmap , and it works when l look up the env in queue-proxy pods. Another problem , I follow the guide of the official website , it looks easy to use opentelemetry-collector to collect the metrics from every component in Knative serving , Acctually as the steps provide online , I successfully deploy the otel but get nothings from the /metrics:8889. By the way , The cluster version is kubernetes 1.21.0.
Looking forward to your guidance.
@jinxin-fu I tried the latest config on minikube and I can verify that I see the metrics (using Knative v0.25 following instructions here https://knative.dev/docs/admin/install/serving/install-serving-with-yaml with Kourier):
$ kubectl apply -f https://github.com/knative/serving/releases/download/v0.25.0/serving-crds.yaml
$ kubectl apply -f https://github.com/knative/serving/releases/download/v0.25.0/serving-core.yaml
$kubectl apply -f https://github.com/knative/net-kourier/releases/download/v0.25.0/kourier.yaml
$ kubectl patch configmap/config-network --namespace knative-serving --type merge --patch '{"data":{"ingress.class":"kourier.ingress.networking.knative.dev"}}'
$kubectl patch --namespace knative-serving configmap/config-observability --type merge --patch '{"data":{"metrics.backend-destination":"opencensus","request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'
$ cat service.yaml
apiVersion: serving.knative.dev/v1 # Current version of Knative
kind: Service
metadata:
name: helloworld-go # The name of the app
namespace: default # The namespace the app will use
spec:
template:
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go # Reference to the image of the app
env:
- name: TARGET # The environment variable printed out by the sample app
value: "Go Sample v1"
$ cat coll.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: metrics
data:
collector.yaml: |
receivers:
opencensus:
endpoint: "0.0.0.0:55678"
exporters:
logging:
prometheus:
endpoint: "0.0.0.0:8889"
extensions:
health_check:
pprof:
zpages:
service:
extensions: [health_check, pprof, zpages]
pipelines:
metrics:
receivers: [opencensus]
processors: []
exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: metrics
labels:
app: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
replicas: 1 # This can be increased for a larger system.
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
args:
- --config=/conf/collector.yaml
image: otel/opentelemetry-collector:latest
resources:
requests: # Note: these are suitable for a small instance, but may need to be increased for a large instance.
memory: 100Mi
cpu: 50m
ports:
- name: otel
containerPort: 55678
- name: prom-export
containerPort: 8889
- name: zpages # A /debug page
containerPort: 55679
volumeMounts:
- mountPath: /conf
name: config
volumes:
- name: config
configMap:
name: otel-collector-config
items:
- key: collector.yaml
path: collector.yaml
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: metrics
spec:
selector:
app: "otel-collector"
ports:
- port: 55678
name: otel
---
apiVersion: v1
kind: Service
metadata:
name: otel-export
namespace: metrics
labels:
app: otel-export
spec:
selector:
app: otel-collector
ports:
- port: 8889
name: prom-export
Triggered the service with:
$minikube service list
...
| kourier-system | kourier | http2/80 | http://192.168.39.218:19711 |
curl -H "Host: helloworld-go.default.example.com" http://192.168.39.218:19711
http://localhost:8889/metrics shows for example:
$ kubectl port-forward --namespace metrics deployment/otel-collector 8889
Forwarding from 127.0.0.1:8889 -> 8889
Forwarding from [::1]:8889 -> 8889
Note there are known issues: https://github.com/skonto/test-otel#known-issues
First , I upgrade Knative from v0.23 to the latest version V0.25. I delete istio which I used in the cluster before and install kourier gateway as which you used in the example . The yamls for otel-collector are the same as yours except the different namespace, and I have config the cm config-network/config-network. The only difference is that , I use kuberneter v1.24.0, not minikube. I change the svc of kourier from LoadBalancer to NodePort and use the next command to trigger the service:
[root@serverless-master opentelemetry]# kubectl get svc -n kourier-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kourier NodePort 10.10.176.171 <none> 80:31420/TCP,443:31022/TCP 40m
kourier-internal ClusterIP 10.10.47.211 <none> 80/TCP 40m
[root@serverless-master opentelemetry]# kubectl get pod -n kourier-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
3scale-kourier-gateway-6d8f6b8549-bdnd8 1/1 Running 0 19m 20.10.1.253 serverless-node1 <none> <none>
[root@serverless-master opentelemetry]# kubectl get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
jinxin http://jinxin.default.example.com jinxin-00001 jinxin-00001 True
[root@serverless-master opentelemetry]# curl -H "Host: jinxin.default.example.com" http://192.168.2.62:31420
Hello Go Sample v2beta!
configmap
# config-observability
metrics.backend-destination: opencensus
metrics.opencensus-address: otel-collector.metrics:55678
request-metrics-backend-destination: opencensus
# config-network
ingress.class: kourier.ingress.networking.knative.dev
# otel-yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: opentelemetry
data:
collector.yaml: |
receivers:
opencensus:
endpoint: "0.0.0.0:55678"
exporters:
logging:
prometheus:
endpoint: "0.0.0.0:8889"
extensions:
health_check:
pprof:
zpages:
service:
extensions: [health_check, pprof, zpages]
pipelines:
metrics:
receivers: [opencensus]
processors: []
exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: opentelemetry
labels:
app: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
replicas: 1 # This can be increased for a larger system.
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
args:
- --config=/conf/collector.yaml
image: otel/opentelemetry-collector:latest
resources:
requests: # Note: these are suitable for a small instance, but may need to be increased for a large instance.
memory: 100Mi
cpu: 50m
ports:
- name: otel
containerPort: 55678
- name: prom-export
containerPort: 8889
- name: zpages # A /debug page
containerPort: 55679
volumeMounts:
- mountPath: /conf
name: config
volumes:
- name: config
configMap:
name: otel-collector-config
items:
- key: collector.yaml
path: collector.yaml
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: opentelemetry
spec:
selector:
app: "otel-collector"
ports:
- port: 55678
name: otel
---
apiVersion: v1
kind: Service
metadata:
name: otel-export
namespace: opentelemetry
labels:
app: otel-export
spec:
selector:
app: otel-collector
ports:
- port: 8889
name: prom-export
# k8s version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Last , stil can't get any metrics data.
Could you paste the collector logs? Are there any restrictions for namespace communication on your cluster eg network policies?
I use kubeadmin and install the kubernetes 1.21.0 with default config, I think there is no network policies to restrict the communication between namespaces.
otel-collector log:
2021-09-16T02:07:01.907Z info service/collector.go:303 Starting otelcol... {"Version": "v0.33.0", "NumCPU": 2}
2021-09-16T02:07:01.909Z info service/collector.go:242 Loading configuration...
2021-09-16T02:07:01.910Z info service/collector.go:258 Applying configuration...
2021-09-16T02:07:01.910Z info builder/exporters_builder.go:226 Ignoring exporter as it is not used by any pipeline {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.910Z info builder/exporters_builder.go:264 Exporter was built. {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.910Z info builder/pipelines_builder.go:214 Pipeline was built. {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.910Z info builder/receivers_builder.go:227 Receiver was built. {"kind": "receiver", "name": "opencensus", "datatype": "metrics"}
2021-09-16T02:07:01.910Z info service/service.go:143 Starting extensions...
2021-09-16T02:07:01.910Z info builder/extensions_builder.go:54 Extension is starting... {"kind": "extension", "name": "health_check"}
2021-09-16T02:07:01.910Z info healthcheckextension/healthcheckextension.go:41 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"}}}
2021-09-16T02:07:01.911Z info builder/extensions_builder.go:60 Extension started. {"kind": "extension", "name": "health_check"}
2021-09-16T02:07:01.911Z info builder/extensions_builder.go:54 Extension is starting... {"kind": "extension", "name": "pprof"}
2021-09-16T02:07:01.913Z info pprofextension/pprofextension.go:79 Starting net/http/pprof server {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2021-09-16T02:07:01.913Z info builder/extensions_builder.go:60 Extension started. {"kind": "extension", "name": "pprof"}
2021-09-16T02:07:01.913Z info builder/extensions_builder.go:54 Extension is starting... {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z info zpagesextension/zpagesextension.go:40 Register Host's zPages {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z info zpagesextension/zpagesextension.go:53 Starting zPages extension {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2021-09-16T02:07:01.913Z info builder/extensions_builder.go:60 Extension started. {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z info service/service.go:188 Starting exporters...
2021-09-16T02:07:01.913Z info builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.913Z info builder/exporters_builder.go:98 Exporter started. {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.913Z info builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.913Z info builder/exporters_builder.go:98 Exporter started. {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.913Z info service/service.go:193 Starting processors...
2021-09-16T02:07:01.913Z info builder/pipelines_builder.go:52 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.913Z info builder/pipelines_builder.go:63 Pipeline is started. {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.913Z info service/service.go:198 Starting receivers...
2021-09-16T02:07:01.913Z info builder/receivers_builder.go:71 Receiver is starting... {"kind": "receiver", "name": "opencensus"}
2021-09-16T02:07:01.913Z info builder/receivers_builder.go:76 Receiver started. {"kind": "receiver", "name": "opencensus"}
2021-09-16T02:07:01.913Z info healthcheck/handler.go:129 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2021-09-16T02:07:01.913Z info service/collector.go:206 Setting up own telemetry...
2021-09-16T02:07:01.914Z info service/telemetry.go:99 Serving Prometheus metrics {"address": ":8888", "level": 0, "service.instance.id": "71704046-4ed9-4f8e-8a8d-8b1cfd9d6020"}
2021-09-16T02:07:01.914Z info service/collector.go:218 Everything is ready. Begin running and processing data.
@jinxin-fu is this still an issue?
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
Ask your question here:
I follow the direct doc "Collecting Metrics with OpenTelemetry" and deploy otel deployment whit relevant service. Also I configure the config-observability:
I use a external service in NodePort mode to expose the service which I can visit it outside the cluster. But when I visit the otel-export:8889/metrics on the webside, an empty page can be got . I wonder if something is being overlooked.