knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.55k stars 1.16k forks source link

In Knatvie 1.24.0 , how can I use opentelemery collector for Knative-components such as autoscaler/Activator #11822

Closed jinxin-fu closed 2 years ago

jinxin-fu commented 3 years ago

Ask your question here:

I follow the direct doc "Collecting Metrics with OpenTelemetry" and deploy otel deployment whit relevant service. Also I configure the config-observability:

kubectl patch --namespace knative-serving configmap/config-observability \
  --type merge \
  --patch '{"data":{"metrics.backend-destination":"opencensus","request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'

I use a external service in NodePort mode to expose the service which I can visit it outside the cluster. But when I visit the otel-export:8889/metrics on the webside, an empty page can be got . I wonder if something is being overlooked.

jinxin-fu commented 3 years ago

Maybe I should use , change "request-metrics-backend-destination" to "metrics.request-metrics-backend-destination"

kubectl patch -n knative-serving cm config-observability --type merge --patch '{"data":{"metrics.backend-destination":"opencensus","metrics.request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'

to config the config-observability but still get nothing from the opentelemetry-collector

The queue-proxy prints logs as follow:

{"level":"info","ts":1629273188.5869856,"logger":"fallback","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter."}
{"level":"info","ts":1629273188.590455,"logger":"fallback","caller":"metrics/opencensus_exporter.go:56","msg":"Created OpenCensus exporter with config:","config":{}}
{"level":"info","ts":1629273188.5905085,"logger":"fallback","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/internal/serving revision opencensus 60000000000 <nil> <nil> otel-collector.metrics:55678 false 0  false   {   false}}"}
jinxin-fu commented 3 years ago

/assign @evankanderson Hi , I am a software developer using Knative to construct a serverless-platform . I have installed Knative 1.23.0 successfully , now I need to use Prometheus or Thanos to monitor the whole system as well as the user-application(metrics from queue-proxy). I wonder if the key in the patch for config-observability should be "request-metrics-backend-destination" or "metrics.request-metrics-backend-destination". I use metrics.request-metrics-backend-destination to patch the configmap , and it works when l look up the env in queue-proxy pods. Another problem , I follow the guide of the official website , it looks easy to use opentelemetry-collector to collect the metrics from every component in Knative serving , Acctually as the steps provide online , I successfully deploy the otel but get nothings from the /metrics:8889. By the way , The cluster version is kubernetes 1.21.0.

Looking forward to your guidance.

skonto commented 3 years ago

@jinxin-fu I tried the latest config on minikube and I can verify that I see the metrics (using Knative v0.25 following instructions here https://knative.dev/docs/admin/install/serving/install-serving-with-yaml with Kourier):

$ kubectl apply -f https://github.com/knative/serving/releases/download/v0.25.0/serving-crds.yaml
$ kubectl apply -f https://github.com/knative/serving/releases/download/v0.25.0/serving-core.yaml
$kubectl apply -f https://github.com/knative/net-kourier/releases/download/v0.25.0/kourier.yaml

$ kubectl patch configmap/config-network   --namespace knative-serving   --type merge   --patch '{"data":{"ingress.class":"kourier.ingress.networking.knative.dev"}}'

$kubectl patch --namespace knative-serving configmap/config-observability   --type merge   --patch '{"data":{"metrics.backend-destination":"opencensus","request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'
$ cat service.yaml 
apiVersion: serving.knative.dev/v1 # Current version of Knative
kind: Service
metadata:
  name: helloworld-go # The name of the app
  namespace: default # The namespace the app will use
spec:
  template:
    spec:
      containers:
        - image: gcr.io/knative-samples/helloworld-go # Reference to the image of the app
          env:
            - name: TARGET # The environment variable printed out by the sample app
              value: "Go Sample v1"

$ cat coll.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: metrics
data:
  collector.yaml: |
    receivers:
      opencensus:
        endpoint: "0.0.0.0:55678"

    exporters:
      logging:
      prometheus:
        endpoint: "0.0.0.0:8889"

    extensions:
      health_check:
      pprof:
      zpages:

    service:
      extensions: [health_check, pprof, zpages]
      pipelines:
        metrics:
          receivers: [opencensus]
          processors: []
          exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: metrics
  labels:
    app: otel-collector
spec:
  selector:
    matchLabels:
      app: otel-collector
  replicas: 1  # This can be increased for a larger system.
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: collector
        args:
        - --config=/conf/collector.yaml
        image: otel/opentelemetry-collector:latest
        resources:
          requests:  # Note: these are suitable for a small instance, but may need to be increased for a large instance.
            memory: 100Mi
            cpu: 50m
        ports:
        - name: otel
          containerPort: 55678
        - name: prom-export
          containerPort: 8889
        - name: zpages  # A /debug page
          containerPort: 55679
        volumeMounts:
          - mountPath: /conf
            name: config
      volumes:
      - name: config
        configMap:
          name: otel-collector-config
          items:
            - key: collector.yaml
              path: collector.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: metrics
spec:
  selector:
    app: "otel-collector"
  ports:
  - port: 55678
    name: otel
---
apiVersion: v1
kind: Service
metadata:
  name: otel-export
  namespace: metrics
  labels:
    app: otel-export
spec:
  selector:
    app: otel-collector
  ports:
  - port: 8889
    name: prom-export

Triggered the service with:

$minikube service list
...
| kourier-system  | kourier                     | http2/80     | http://192.168.39.218:19711 |

curl -H "Host:  helloworld-go.default.example.com" http://192.168.39.218:19711

http://localhost:8889/metrics shows for example:

$ kubectl port-forward --namespace metrics deployment/otel-collector 8889
Forwarding from 127.0.0.1:8889 -> 8889
Forwarding from [::1]:8889 -> 8889

image

Note there are known issues: https://github.com/skonto/test-otel#known-issues

jinxin-fu commented 3 years ago

First , I upgrade Knative from v0.23 to the latest version V0.25. I delete istio which I used in the cluster before and install kourier gateway as which you used in the example . The yamls for otel-collector are the same as yours except the different namespace, and I have config the cm config-network/config-network. The only difference is that , I use kuberneter v1.24.0, not minikube. I change the svc of kourier from LoadBalancer to NodePort and use the next command to trigger the service:

[root@serverless-master opentelemetry]# kubectl get svc -n kourier-system
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
kourier            NodePort    10.10.176.171   <none>        80:31420/TCP,443:31022/TCP   40m
kourier-internal   ClusterIP   10.10.47.211    <none>        80/TCP                       40m

[root@serverless-master opentelemetry]# kubectl get pod -n kourier-system -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP            NODE               NOMINATED NODE   READINESS GATES
3scale-kourier-gateway-6d8f6b8549-bdnd8   1/1     Running   0          19m   20.10.1.253   serverless-node1   <none>           <none>

[root@serverless-master opentelemetry]# kubectl get ksvc
NAME     URL                                 LATESTCREATED   LATESTREADY    READY   REASON
jinxin   http://jinxin.default.example.com   jinxin-00001    jinxin-00001   True

[root@serverless-master opentelemetry]# curl -H "Host: jinxin.default.example.com" http://192.168.2.62:31420
Hello Go Sample v2beta!
configmap
# config-observability
  metrics.backend-destination: opencensus
  metrics.opencensus-address: otel-collector.metrics:55678
  request-metrics-backend-destination: opencensus
# config-network 
ingress.class: kourier.ingress.networking.knative.dev
# otel-yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: opentelemetry
data:
  collector.yaml: |
    receivers:
      opencensus:
        endpoint: "0.0.0.0:55678"
    exporters:
      logging:
      prometheus:
        endpoint: "0.0.0.0:8889"
    extensions:
      health_check:
      pprof:
      zpages:
    service:
      extensions: [health_check, pprof, zpages]
      pipelines:
        metrics:
          receivers: [opencensus]
          processors: []
          exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: opentelemetry
  labels:
    app: otel-collector
spec:
  selector:
    matchLabels:
      app: otel-collector
  replicas: 1  # This can be increased for a larger system.
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: collector
        args:
        - --config=/conf/collector.yaml
        image: otel/opentelemetry-collector:latest
        resources:
          requests:  # Note: these are suitable for a small instance, but may need to be increased for a large instance.
            memory: 100Mi
            cpu: 50m
        ports:
        - name: otel
          containerPort: 55678
        - name: prom-export
          containerPort: 8889
        - name: zpages  # A /debug page
          containerPort: 55679
        volumeMounts:
          - mountPath: /conf
            name: config
      volumes:
      - name: config
        configMap:
          name: otel-collector-config
          items:
            - key: collector.yaml
              path: collector.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: opentelemetry
spec:
  selector:
    app: "otel-collector"
  ports:
  - port: 55678
    name: otel
---
apiVersion: v1
kind: Service
metadata:
  name: otel-export
  namespace: opentelemetry
  labels:
    app: otel-export
spec:
  selector:
    app: otel-collector
  ports:
  - port: 8889
    name: prom-export
# k8s version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

Last , stil can't get any metrics data. image

skonto commented 3 years ago

Could you paste the collector logs? Are there any restrictions for namespace communication on your cluster eg network policies?

jinxin-fu commented 3 years ago

I use kubeadmin and install the kubernetes 1.21.0 with default config, I think there is no network policies to restrict the communication between namespaces.

otel-collector log:

2021-09-16T02:07:01.907Z        info    service/collector.go:303        Starting otelcol...     {"Version": "v0.33.0", "NumCPU": 2}
2021-09-16T02:07:01.909Z        info    service/collector.go:242        Loading configuration...
2021-09-16T02:07:01.910Z        info    service/collector.go:258        Applying configuration...
2021-09-16T02:07:01.910Z        info    builder/exporters_builder.go:226        Ignoring exporter as it is not used by any pipeline     {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.910Z        info    builder/exporters_builder.go:264        Exporter was built.     {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.910Z        info    builder/pipelines_builder.go:214        Pipeline was built.     {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.910Z        info    builder/receivers_builder.go:227        Receiver was built.     {"kind": "receiver", "name": "opencensus", "datatype": "metrics"}
2021-09-16T02:07:01.910Z        info    service/service.go:143  Starting extensions...
2021-09-16T02:07:01.910Z        info    builder/extensions_builder.go:54        Extension is starting...        {"kind": "extension", "name": "health_check"}
2021-09-16T02:07:01.910Z        info    healthcheckextension/healthcheckextension.go:41 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"}}}
2021-09-16T02:07:01.911Z        info    builder/extensions_builder.go:60        Extension started.      {"kind": "extension", "name": "health_check"}
2021-09-16T02:07:01.911Z        info    builder/extensions_builder.go:54        Extension is starting...        {"kind": "extension", "name": "pprof"}
2021-09-16T02:07:01.913Z        info    pprofextension/pprofextension.go:79     Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2021-09-16T02:07:01.913Z        info    builder/extensions_builder.go:60        Extension started.      {"kind": "extension", "name": "pprof"}
2021-09-16T02:07:01.913Z        info    builder/extensions_builder.go:54        Extension is starting...        {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z        info    zpagesextension/zpagesextension.go:40   Register Host's zPages  {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z        info    zpagesextension/zpagesextension.go:53   Starting zPages extension       {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2021-09-16T02:07:01.913Z        info    builder/extensions_builder.go:60        Extension started.      {"kind": "extension", "name": "zpages"}
2021-09-16T02:07:01.913Z        info    service/service.go:188  Starting exporters...
2021-09-16T02:07:01.913Z        info    builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.913Z        info    builder/exporters_builder.go:98 Exporter started.       {"kind": "exporter", "name": "logging"}
2021-09-16T02:07:01.913Z        info    builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.913Z        info    builder/exporters_builder.go:98 Exporter started.       {"kind": "exporter", "name": "prometheus"}
2021-09-16T02:07:01.913Z        info    service/service.go:193  Starting processors...
2021-09-16T02:07:01.913Z        info    builder/pipelines_builder.go:52 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.913Z        info    builder/pipelines_builder.go:63 Pipeline is started.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-09-16T02:07:01.913Z        info    service/service.go:198  Starting receivers...
2021-09-16T02:07:01.913Z        info    builder/receivers_builder.go:71 Receiver is starting... {"kind": "receiver", "name": "opencensus"}
2021-09-16T02:07:01.913Z        info    builder/receivers_builder.go:76 Receiver started.       {"kind": "receiver", "name": "opencensus"}
2021-09-16T02:07:01.913Z        info    healthcheck/handler.go:129      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2021-09-16T02:07:01.913Z        info    service/collector.go:206        Setting up own telemetry...
2021-09-16T02:07:01.914Z        info    service/telemetry.go:99 Serving Prometheus metrics      {"address": ":8888", "level": 0, "service.instance.id": "71704046-4ed9-4f8e-8a8d-8b1cfd9d6020"}
2021-09-16T02:07:01.914Z        info    service/collector.go:218        Everything is ready. Begin running and processing data.
skonto commented 2 years ago

@jinxin-fu is this still an issue?

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.