jaegertracing / jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
https://www.jaegertracing.io/docs/latest/operator/
Apache License 2.0
1.03k stars 345 forks source link

Expose collector via ingress on a kubernetes cluster #1048

Open jecnua opened 4 years ago

jecnua commented 4 years ago

HI all,

This is a feature request that was already proposed in https://github.com/jaegertracing/jaeger-operator/issues/420 and closed for luck of additional requests/interest. However I would like to propose again to add the ability in the operator to create an ingress resource in kubernetes to allow traffic external from the cluster to be managed and ingested by the collectors.

Use case: I have a kubernetes cluster that I use to run management tools (like tracing aggregation for example) and I have external applications running in EC2/Lambda/somewhere-else that are interested on implementing tracing without re-doing the collector deployments.

I understand we can manually create an ingress, however this is sort of a pain when you are managing the jaeger operator lifecycle via Helm charts: https://github.com/jaegertracing/helm-charts/tree/master/charts/jaeger-operator

Not having this functionality will force me to have a second chart, that I manage, with a single resource inside (the ingress) that I need to "point" to the collector.

I don't think security should be addressed in this request, since securing the ingress endpoint you are using (like an E/ALB in k8s) should be up to the user and not the operator.

Please feel free to close this request immediately if you think this was addressed and there is no intention to go forward. Otherwise we could keep the request around just to aggregate interest via +1 and see where it goes.

jpkrohling commented 4 years ago

I can feel your pain, and I'm almost convinced we could have something in the operator for this. Let's just see if other people have the same needs you have, with the same problems in applying a workaround.

To be clear: the currently proposed workaround is to create the ingress/route externally, using whatever you are using to manage your Operator CR. With Helm, it can be an extra pain, but with other solutions, it's not that big of a problem.

ilhaan commented 4 years ago

Being able to expose collector using an ingress would be super helpful for me as well.

Seems like newer versions (>0.30) of nginx-ingress-controller have support for gRPC backends as per this.

OferPRTZ commented 4 years ago

Here as well! super important for us, its just big pain to use different charts just for that tiny ingress.

Thanks!

vfarafontov commented 4 years ago

Leaving a ping here as well; exact same use-case. K8s Cluster is running the Jaeger Operator, there are some services outside of the cluster that need to talk to the collector.

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

jecnua commented 4 years ago

Yes bot, it's still relevant like 95% of the issue on github marked by stale bots. Please automatically remove the label.

jpkrohling commented 4 years ago

My bad about the stale bot: I tuned it down to only consider items with the needs-info label, but I ended up adding this label to this issue here, which isn't appropriate. I just removed the label, so, this should not be automatically closed.

jpkrohling commented 4 years ago

To get this issue moving: I would have this as a opt-in behavior. My question is how it should be added: perhaps it would be good to have JaegerIngressSpec as a new child of the JaegerCollectorSpec? This would make it inconsistent with how it is specified for query at the moment, but I think we can solve it somehow.

https://github.com/jaegertracing/jaeger-operator/blob/519bf244a9e44f886acaa5a2ce7d2cd7afe59ddc/pkg/apis/jaegertracing/v1/jaeger_types.go#L228-L255

jecnua commented 4 years ago

Hi @jpkrohling, sorry for the delay of this answer the message got lost in all the github noise. Thanks for removing the stale label and sorry for my reply above. I have a personal hate towards any type of "stale bots" but is nothing I have against this repo or project specificallly :)

I would have this as a opt-in behavior

Absolutely.

My question is how it should be added: perhaps it would be good to have JaegerIngressSpec as a new child of the JaegerCollectorSpec? This would make it inconsistent with how it is specified for query at the moment, but I think we can solve it somehow.

I wish I could help more but my go is really very young. What I can tell you is that I agree it is the right place to put it since it's part of that specific component.

js8080 commented 4 years ago

I am also interested in this behavior for the same reasons mentioned above. I am running the Jaeger Collector inside a Kubernetes cluster but need to allow Agents from outside the cluster to communicate with it as not all applications we're tracing are running in Kubernetes.

jpkrohling commented 3 years ago

Anyone interested in sending a PR with a proposal for this one?

jlynch93 commented 3 years ago

Was there any progress on this? I am in the same boat as @js8080

js8080 commented 3 years ago

Was there any progress on this? I am in the same boat as @js8080

I ended up defining my own Ingress (I use NGINX as an Ingress Controller). Hopefully this helps you. I use gRPC secured via mTLS to communicate between my Jaeger Agent and the Collector, hence the gRPC references below. Also, I used a self-signed cert for the client cert so the jaeger-agent-certs reference below just points to a secret containing that self-signed cert (public key only). The jaeger-collector-tls-secret is another self-signed cert I used for the server certificate but that also contains a private key.

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: simple-prod-collector
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    # Enable client certificate authentication
    nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
    # Create the secret containing the trusted ca certificates
    nginx.ingress.kubernetes.io/auth-tls-secret: "monitoring/jaeger-agent-certs"
    # Specify the verification depth in the client certificates chain
    # nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
    # Specify an error page to be redirected to verification errors
    # nginx.ingress.kubernetes.io/auth-tls-error-page: "http://www.mysite.com/error-cert.html"
    # Specify if certificates are passed to upstream server
    nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
spec:
  rules:
  - host: jaeger-collector.mycontainer-dev.myhostname.com
    http:
      paths:
      - backend:
          serviceName: simple-prod-collector
          servicePort: 14250
  tls:
  - secretName: jaeger-collector-tls-secret
    hosts:
      - jaeger-collector.mycontainer-dev.myhostname.com
EOF
ilhaan commented 3 years ago

@js8080 I set up the same ingress as you did and my clients are unable to connect to the collector. This is the error I see:

{"level":"error","ts":1620695096.1566715,"caller":"grpc/reporter.go:74","msg":"Could not send spans over gRPC","error":"rpc error: code = Unavailable desc = last connection error: connection closed","stacktrace":"github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).send\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:74\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).EmitBatch\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:53\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter.(*MetricsReporter).EmitBatch\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/metrics.go:83\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter.(*ClientMetricsReporter).EmitBatch\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/client_metrics.go:120\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process\n\tgithub.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:138\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process\n\tgithub.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:112\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:121\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.NewThriftProcessor.func2\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:86"}
js8080 commented 3 years ago

@ilhaan I am not sure what the issue is. Are you sending from the Jaeger Agent to the Collector or is your traced client attempting to send directly to the Collector (bypassing the Agent)? I am using the Jaeger Agent in my case.

ilhaan commented 3 years ago

@js8080 I'm using Jaeger Agent as well. How do you have --reporter.grpc.host-port configured?

js8080 commented 3 years ago

@ilhaan I use a docker-compose file to run my agent:

version: "3"

services:

  jaeger-agent:
    image: jaegertracing/jaeger-agent
    hostname: jaeger-agent
    command: [
      "--reporter.grpc.host-port=jaeger-collector.mycontainer-dev.myhostname.com:443",
      "--reporter.grpc.tls.enabled=true",                                   # Enable TLS when talking to the remote server(s)
      "--reporter.grpc.tls.ca=/usr/share/jaeger/jaeger-collector.tls.crt",  # Path to a TLS CA (Certification Authority) file used to verify the remote server(s) (by default will use the system truststore)
      "--reporter.grpc.tls.cert=/usr/share/jaeger/jaeger-agent.tls.crt",    # Path to a TLS Certificate file, used to identify this client to the remote server(s) via mTLS
      "--reporter.grpc.tls.key=/usr/share/jaeger/jaeger-agent.tls.key",     # Path to a TLS Private Key file, used to identify this client to the remote server(s) via mTLS
      "--log-level=debug"
    ]
    # Other commands we may want to specify:
    # --reporter.grpc.tls.server-name | Override the TLS server name we expect in the certificate of the remote server(s)
    volumes:
      - type: bind
        source: ./jaeger-collector.tls.crt
        target: /usr/share/jaeger/jaeger-collector.tls.crt
      - type: bind
        source: ./jaeger-agent-selfsigned.tls.crt
        target: /usr/share/jaeger/jaeger-agent.tls.crt
      - type: bind
        source: ./jaeger-agent-selfsigned.tls.key
        target: /usr/share/jaeger/jaeger-agent.tls.key
    ports:
      - "6831:6831/udp"  # UDP | accept jaeger.thrift in compact Thrift protocol used by most current Jaeger clients
      - "5778:5778"      # HTTP | serve configs, sampling strategies
      - "14271:14271"    # HTTP | admin port: health check at / and metrics at /metrics
      # - "5775:5775/udp"  # UDP | accept zipkin.thrift in compact Thrift protocol (deprecated; only used by very old Jaeger clients, circa 2016)
      # - "6832:6832/udp"  # UDP | accept jaeger.thrift in binary Thrift protocol used by Node.js Jaeger client (because thriftrw npm package does not support compact protocol)
    restart: on-failure
ilhaan commented 3 years ago

@js8080 Thanks!

In case other people stumble upon this, I had to switch Jaeger collector to use HAProxy Ingress Controller instead of Nginx Ingress controller. Following are the annotations I used for the collector's ingress:

kubernetes.io/ingress.class: haproxy
haproxy.org/server-proto: h2

After the ingress is created, I patched it to use port 14250 instead of 14268

nikfot commented 3 years ago

Hello, is this issue moving forward? I am also interested. If no PR is not submited about it yet, @jpkrohling I would like to take some time and come back with a try.

jpkrohling commented 3 years ago

@nikfot, sure! The first step is to send in a proposal, and I suggest showing how you think the configuration should look like. Once we all agree on that, implementing a solution should be relatively easy.

nikfot commented 3 years ago

@nikfot, sure! The first step is to send in a proposal, and I suggest showing how you think the configuration should look like. Once we all agree on that, implementing a solution should be relatively easy.

Great! I will come back with a proposal on this thread asap.

nikfot commented 3 years ago

@jpkrohling my basic idea would be to abstract the JaegerIngressSpec and decouple it from the query. So we could have a JaegerIngresses struct containing a query and a collector objects -and maybe more in the future(?)- of type JaegerIngressSpec each.

In that case the configuration would looke like this (borrowing from the jaeger sample ingress config):

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: ingress-with-hosts
spec:
  ingress:
    query:
        enabled: true
        hosts:
          - query-mesh-jaeger.xxx.com #your domain name.
   collector:
        enabled: true
        hosts:
          - collector-mesh-jaeger.xxx.com #your domain name.       

That is what I think would cause less disruption in the rest of the code. I am not really experienced in contributing to opensource and I am a fresh user of jaeger, so I would be happy to read your opinion or suggestion and maybe I can get down to work.

jpkrohling commented 3 years ago

We try to keep the CR as domain-specific as possible, meaning that the high-level items are about items that have a meaning for Jaeger, like collector, query, ingester, storage, and so on. We do have precedents of having non-domain items here, evidenced by the JaegerCommonSpec (Volumes/VolumeMounts, and related). Can you think of another proposal with that concept in mind? I don't dislike your proposal, but I wonder if we can have a better one.

nikfot commented 3 years ago

We try to keep the CR as domain-specific as possible, meaning that the high-level items are about items that have a meaning for Jaeger, like collector, query, ingester, storage, and so on. We do have precedents of having non-domain items here, evidenced by the JaegerCommonSpec (Volumes/VolumeMounts, and related). Can you think of another proposal with that concept in mind? I don't dislike your proposal, but I wonder if we can have a better one.

Of course I can come up with another one. I am really ok with what you say since , as I mentioned before, I have little experience in such procedures, so for me even commenting on my proposal is a chance to learn more.

I'll get back.

nikfot commented 3 years ago

@jpkrohling I took a closer look at the structs and it looks like the Ingress is standing at the same level as thecollector, query, ingester and storage. https://github.com/jaegertracing/jaeger-operator/blob/519bf244a9e44f886acaa5a2ce7d2cd7afe59ddc/pkg/apis/jaegertracing/v1/jaeger_types.go#L86-L120

I could be an idea to move the JaegerIngressSpec item inside the query and collector structs or maybe even the rest if this is adding value. In that case the config would look somethin like:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: ingress-with-hosts
spec:
    query:
        ingress:
              enabled: true
              hosts:
                - query-mesh-jaeger.xxx.com #your domain name.
   collector:
       ingress:
            enabled: true
            hosts:
              - collector-mesh-jaeger.xxx.com #your domain name.       

If for some reason we don't want to break for no reason old configuration, we might as well just add the JaegerIngressSpec item in the collector only and keep it at the JaegerSpec as well, which will refer to the query.

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: ingress-with-hosts
spec:
   ingress:
       enabled: true
          hosts:
            - query-mesh-jaeger.xxx.com #your domain name.
   collector:
       ingress:
            enabled: true
            hosts:
              - collector-mesh-jaeger.xxx.com #your domain name.       

Let me know if you are ok with any of these @jpkrohling or if you have some other idea.

jpkrohling commented 3 years ago

Sorry, I missed this one somehow. I like the first proposal, but for backward compatibility purposes, I think we have to go with the second. Perhaps we can deprecate the main ingress in favor of a new one under the query node? We can then copy the .Spec.Ingress to .Spec.Query.Ingress under certain conditions (like, when .Spec.Query.Ingress is empty).

nikfot commented 3 years ago

That sounds nice! So we agree to (1) add an Ingress field to the collector struct .Spec.Collector.Ingress , (2) add an Ingress field to the query struct .Spec.Query.Ingress and (3) keep the .Spec.Ingress for backwards compatibility as a deprecated feature. Do I procceed on creating a Pull Request as soon as those are ready?

jpkrohling commented 3 years ago

Sounds good!

gecube commented 2 years ago

Good day! Any progress with the issue? My employer also wants to publish collector via ingress and it will be very nice to achieve it with proper configuration of kind: "Jaeger"

nikfot commented 2 years ago

Hi! In fact I wen along with implementing some code on loca branch, but afterwards priorities changed so I abandonded it. I have intentions on finishing but I am not sure when. If you'd like to take it feel free.

pranoyk commented 1 year ago

@jpkrohling If no one is working on this currently, I would like to give it a try. You can assign this to me.

iblancasa commented 1 year ago

Done @pranoyk

gaeljw commented 11 months ago

Not sure where to put this info but I'm facing an issue that the manually added Ingress is getting deleted each time the Jaeger CR is recreated/synced for some reason. Not sure why the operator is removing the Ingress.

Kinda related to https://github.com/jaegertracing/jaeger-operator/issues/408 but it's not the Ingress that should be managed by the Operator.

Should I open an issue for that or maybe my Ingress definition is wrong? How does the operator work? It controls all resources of the namespace?

gaeljw commented 11 months ago

Not sure where to put this info but I'm facing an issue that the manually added Ingress is getting deleted each time the Jaeger CR is recreated/synced for some reason. Not sure why the operator is removing the Ingress.

Kinda related to #408 but it's not the Ingress that should be managed by the Operator.

Should I open an issue for that or maybe my Ingress definition is wrong? How does the operator work? It controls all resources of the namespace?

Nevermind.. silly me! I had used the labels managed-by: jaeger-operator. I removed all labels and now the Ingress stays!