open-telemetry / opentelemetry-helm-charts

OpenTelemetry Helm Charts
https://opentelemetry.io
Apache License 2.0
364 stars 445 forks source link

desc = "error reading server preface: http2: frame too large" #646

Open haseeb-aziz opened 1 year ago

haseeb-aziz commented 1 year ago

Hello I’m getting this error “Err: connection error: desc = "error reading server preface: http2: frame too large" {"grpc_log": true}” when I use this configuration in eks-kubernetes-cluster. But same configuration when i use successfully export prometheus metrics to uptrace without eks kubernetes cluster

Configuration:

prometheus_simple: collection_interval: 10s endpoint: '10.XXX.XX.XXX:9090' metrics_path: '/metrics' use_service_account: false tls_enabled: false

exporters: otlp: endpoint: 10.XXX.X.XX:14317 headers: { 'uptrace-dsn': 'http://project2_secret_token@10.XXX.XX.XX:14317/2' } tls: insecure: true

TylerHelmuth commented 1 year ago

@haseeb-aziz that error message is normally hiding a deeper issue. I've seen it when not being able to communicate properly with the grpc endpoint I'm trying to send data to.

Can you update your issue with your values.yaml? Which helm chart version are you using?

TylerHelmuth commented 1 year ago

@haseeb-aziz Please format your post as yaml

haseeb-aziz commented 1 year ago

This is the logs of opentelemetry pod. I'm getting this error

info exporterhelper/queued_retry.go:433 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"", "interval": "38.375612262s"}

Please advise

haseeb-aziz commented 1 year ago

@TylerHelmuth Thanks Helm chart version: 0.48.1

values.yaml file of opentelemetry

nameOverride: ""
fullnameOverride: ""

mode: "deployment"

presets:
  logsCollection:
    enabled: false
    includeCollectorLogs: false
    storeCheckpoints: false

  hostMetrics:
    enabled: false

  kubernetesAttributes:
    enabled: false

  clusterMetrics:
    enabled: false

  kubeletMetrics:
    enabled: false

configMap:
  create: true
config:
  exporters:
    logging: {}
  extensions:
    health_check: {}
    memory_ballast: {}
  processors:
    batch: {}
    memory_limiter: null
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${MY_POD_IP}:4317
        http:
          endpoint: ${MY_POD_IP}:4318
    prometheus_simple:
      collection_interval: 10s
      endpoint: 'XXX.XXX.XXX.147:9090'
      metrics_path: '/metrics'
      use_service_account: false
      tls_enabled: false   
    zipkin:
      endpoint: ${MY_POD_IP}:9411
  exporters:
    otlp:
      endpoint: XX.XXX.1.XXX:14317
      headers: { 'uptrace-dsn': 'http://project2_secret_token@10.XXX.XX.XXX:14317/2' }
      tls:
        insecure: true 

  service:
    telemetry:
      metrics:
        address: ${MY_POD_IP}:9090
    extensions:
      - health_check
      - memory_ballast
    pipelines:
      metrics:
        exporters:
          - otlp
        receivers:
          - prometheus_simple

image:
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  tag: ""
  digest: ""
imagePullSecrets: []
command:
  name: otelcol-contrib
  extraArgs: []

serviceAccount:
  create: true
  annotations: {}
  name: ""

clusterRole:
  create: false
  annotations: {}
  name: ""
  rules: []
  clusterRoleBinding:
    annotations: {}
    name: ""

podSecurityContext: {}
securityContext: {}

nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: {}

priorityClassName: ""

extraEnvs: []
extraVolumes: []
extraVolumeMounts: []

ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
    # nodePort: 30317
    appProtocol: grpc
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-compact:
    enabled: true
    containerPort: 6831
    servicePort: 6831
    hostPort: 6831
    protocol: UDP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  zipkin:
    enabled: true
    containerPort: 9411
    servicePort: 9411
    hostPort: 9411
    protocol: TCP
  metrics:
    enabled: false
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

resources:
  limits:
    cpu: 256m
    memory: 512Mi

podAnnotations: {}

podLabels: {}

hostNetwork: false

dnsPolicy: ""

replicaCount: 1

revisionHistoryLimit: 10

annotations: {}

extraContainers: []

initContainers: []

lifecycleHooks: {}

service:
  type: ClusterIP
  annotations: {}

ingress:
  enabled: false
  additionalIngresses: []

podMonitor:
  enabled: false
  metricsEndpoints:
    - port: metrics
  extraLabels: {}

serviceMonitor:
  enabled: false
  metricsEndpoints:
    - port: metrics
  extraLabels: {}

podDisruptionBudget:
  enabled: false

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

rollout:
  rollingUpdate: {}
  strategy: RollingUpdate

prometheusRule:
  enabled: false
  groups: []
  defaultRules:
    enabled: false
  extraLabels: {}

statefulset:
  volumeClaimTemplates: []
  podManagementPolicy: "Parallel"
haseeb-aziz commented 1 year ago

@TylerHelmuth
I also check uptrace pod using tcpdump, but Uptrace not receiving any traffic from opentelelmetry on that port. Same configuration work uptrace and otel collector deployed on same host using docker compose.

povilasv commented 1 year ago

This looks like exporter misconfiguration. The server is not grpc or behaves weirdly? You can debug this by adding environment variable GODEBUG=http2debug=2 and it should dump the grpc requests it is trying to send.

Reference: https://stackoverflow.com/a/44482155

qdongxu commented 1 year ago

This may be caused by the grpc client sending HTTP/2 to a https server.

I got a similar issue when sending grpc request to a nginx grpc proxy.

this got the 'http2: frame too large' error when Dial as below( had intended to ignore any SSL verification):

    conn, err := grpc.Dial(*addr, grpc.WithTransportCredentials(insecure.NewCredentials()))

as tcpdump observed, the client sends HTTP/2 frames without TLS encryption. and the server side sends back 404 BadRequest and error messages in HTTP body unencrypted in HTTP 1.1. Then the client reports error reading server preface: http2: frame too large

but succeeded in this way (load the server certificate):

    proxyCA := "/var/tmp/fullchain.pem" // CA cert that signed the proxy
    f, err := os.ReadFile(proxyCA)

    p := x509.NewCertPool()
    p.AppendCertsFromPEM(f)
    tlsConfig := &tls.Config{
        RootCAs: p,
    }
    conn, err := grpc.Dial(*addr, grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)))

It may be not the same cause. Just for reference since I am looking for the root cause and come across this issue.

basch255 commented 1 year ago

Are there any updates on this topic? I run into a similar error. It try to connect two collectors via grpc over nginx (secured by oidc).

client-collector:

    exporters:
      otlp:
        endpoint: server-collector.example.de:443
        auth:
          authenticator: oauth2client
        tls:
          insecure_skip_verify: true

server-collector:

    receivers:
      otlp:
        protocols:
          grpc:
            auth:
              authenticator: oidc

ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
  name: server-collector
spec:
  ingressClassName: nginx
  rules:
    - host: server-collector.example.de
      http:
        paths:
          - backend:
              service:
                name: server-collector
                port:
                  number: 4317
            path: /
            pathType: ImplementationSpecific
haseeb-aziz commented 1 year ago

use http rather then grpc. My problem resolve using http.

veyselsahin commented 1 year ago

@haseeb-aziz could you able to sent metrics over HTTP instead of GRPC?

haseeb-aziz commented 12 months ago

yes, Using http its working fine

On Fri, Jul 7, 2023, 10:05 PM Veysel Şahin @.***> wrote:

@haseeb-aziz https://github.com/haseeb-aziz could you able to sent metrics over HTTP instead of GRPC?

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/opentelemetry-helm-charts/issues/646#issuecomment-1625698514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM5EEPRKANKT6QDEMSMMODXPA6U3ANCNFSM6AAAAAAU42ACTE . You are receiving this because you were mentioned.Message ID: @.*** com>

xiaoqinglee commented 12 months ago

"http2: frame too large" will happen when you try to initialize an http2 connection (using grpc, for example) to a target port which is expecting http1.1 connections.

shiw2021 commented 11 months ago

same error in v2rayN client. after edit config, i delete the fingerprint input (which used to be "chrome"). the error gone.

raniellyferreira commented 8 months ago

you are trying connecting to http/2 over http, try again with tls connection with insecure_skip_verify = true

pedrogneri commented 7 months ago

I have the same issue, my grpc server use a nginx proxy that have TLS. So i just have to add TLS credentials to my Dial

        grpcClientConn, err := grpc.Dial(os.Getenv("GRPC_SERVER_ADDR"), grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})))

instead of

    conn, err := grpc.Dial(*addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
izavadynskyi commented 6 months ago

Hi guys, I have also faced with the same issue. Does anybody configured the external tempo-distributor endpoint to send GRPC traces (port 4317) via ingress nginx? We have dedicated common Tempo cluster in separate EKS environment and using the otlp http port via tempo gateway for sending the black box traces using OTel agents from applications hosted on others EKS clusters and this solution works fro us. But for some services we need to use GRPC only. So tried to created direct ingress endpoint to tempo-distributor port 4317 but faced with the following exceptions:

2023-12-22T10:25:07.675Z    warn    zapgrpc/zapgrpc.go:195  [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "tempo-distributor.dev.observability.internal:80",
  "ServerName": "tempo-distributor.dev.observability.internal:80",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "error reading server preface: http2: frame too large" {"grpc_log": true}

Ingress configuration :

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx-internal
    meta.helm.sh/release-name: tempo
    meta.helm.sh/release-namespace: tempo
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/grpc-backend: "true"
  labels:
    app.kubernetes.io/component: distributor
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: tempo
    app.kubernetes.io/version: 2.0.1
    helm.sh/chart: tempo-distributed-1.2.7
  name: tempo-distributor
  namespace: tempo
spec:
  rules:
  - host: tempo-distributor.dev.observability.internal
    http:
      paths:
      - backend:
          service:
            name: tempo-distributor
            port:
              number: 4317
        path: /
        pathType: Prefix

Opentelemetry collector config below:

kind: OpenTelemetryCollector
metadata:
  name: cloud
spec:
  config: |
    receivers:
      otlp:
        protocols:
          http:
          grpc:
            endpoint: 0.0.0.0:5555
    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
    exporters:
      logging:
        loglevel: info
      otlp:
        endpoint: tempo-distributor.dev.observability.internal:80
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp] 

verified such configuration with local tempo-distributor service (sending traces directly via Opentelemetry collector to tempo-distributor port 4317 service without ingress ) and everything works properly

Will be appreciated for any help if somebody used such approach

izavadynskyi commented 6 months ago

resolving it on Opentelemetry collector side

  config: |
    receivers:
      otlp:
        protocols:
          http:
          grpc:
            endpoint: 0.0.0.0:5555
    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
    exporters:
      logging:
        loglevel: info
      otlphttp:
        endpoint: [http://tempo-gateway.dev.observability.internal:80/otlp](http://tempo-gateway.dev.observability.internal/otlp)
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlphttp]

by receiving receivers: [otlp] and exporting exporters: [otlphttp] to ingress tempo-gateway

OndrejValenta commented 4 months ago

Ok, so in my case it was a certificate issue. Or more like Loki going through proxy, even if it shouldnt, it does not respect NO_PROXY.

Once I put CA certificate to trusted cert store, and disabled skipped verification, all is good.

http_config: insecure_skip_verify: false

fkamaliada commented 2 months ago

Same here (AWS EKS environment). Problem solved after changing otel collector config map from:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:55680
exporters:
  otlp/data-prepper:
    endpoint: data-prepper.opentelemetry:21890
    tls:
      insecure: true
  otlp/data-prepper-metrics:
    endpoint: data-prepper-metrics.opentelemetry:4900
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/data-prepper]
    metrics:
      receivers: [otlp]
      exporters: [otlp/data-prepper-metrics]

to:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:55680
exporters:
  otlphttp/data-prepper:
    endpoint: data-prepper.opentelemetry:21890
    tls:
      insecure: true
  otlphttp/data-prepper-metrics:
    endpoint: data-prepper-metrics.opentelemetry:4900
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp/data-prepper]
    metrics:
      receivers: [otlp]
      exporters: [otlphttp/data-prepper-metrics]