kubernetes-retired / contrib

[EOL] This is a place for various components in the Kubernetes ecosystem that aren't part of the Kubernetes core.
Apache License 2.0
2.46k stars 1.68k forks source link

Ingress Nginx / GLBC conflict #1657

Closed pdoreau closed 6 years ago

pdoreau commented 8 years ago

Hello. I'm trying the basic HTTP example for configuring an Nginx Ingress Contrroller

It looks like both Nginx RC and GLBC are enabled. Here is my describe ingress command output :

Name:           echomap
Namespace:      default
Address:        XXX.XXX.XX.XXX,YYY.YYY.YYY.YYY
Default backend:    default-http-backend:80 (XX.X.X.X:8080)
Rules:
  Host      Path    Backends
  ----      ----    --------
  foo.bar.com   
            /foo    echoheaders-x:80 (<none>)
  bar.baz.com   
            /bar    echoheaders-y:80 (<none>)
            /foo    echoheaders-x:80 (<none>)
Annotations:
  forwarding-rule:  k8s-fw-default-echomap--...
  target-proxy:     k8s-tp-default-echomap--...
  url-map:      k8s-um-default-echomap--...
  backends:     {"k8s-be-...":"Unknown"}
Events:
  FirstSeen LastSeen    Count   From                SubobjectPath   Type        Reason  Message
  --------- --------    -----   ----                -------------   --------    ------  -------
  24m       24m     1   {nginx-ingress-controller }         Normal      CREATE  default/echomap
  24m       23m     2   {nginx-ingress-controller }         Normal      CREATE  ip: XXX.XXX.XX.XXX
  23m       23m     1   {loadbalancer-controller }          Normal      CREATE  ip: YYY.YYY.YYY.YYY
  23m       23m     1   {loadbalancer-controller }          Warning     Status  Operation cannot be fulfilled on ingresses.extensions "echomap": the object has been modified; please apply your changes to the latest version and try again
  24m       23m     4   {nginx-ingress-controller }         Normal      UPDATE  default/echomap

I've added the annotation to disable GLBC with "nginx". I got a 502 response with the first IP (GLBC I guess) and no response from the second.

Is there something else to do to disable GLBC / enable Nginx ?

bprashanth commented 8 years ago

which version of kubernetes are you running? the annotation will only work post 1.3. are you running on gce or gke? did you create the ingress with the annoation, or modify it? can you exec into your nginx ingress controller and curl localhost to see the correct output? "disabling" GLBC just means it will ignore your ingress, it will not tear down the old existing resources etc. the ip you want from the list of ips is the one that matches the external ip of the node the nginx controller pod is running on, if that doesn't work you might need to open up a firewall rule for the right ports on that node/cluster.

pdoreau commented 8 years ago

Kubernetes v1.3.5 on GKE. The first time I created the ingress was without the annotation. But then I deleted and recreated it, nothing changed.

I've just tried to set the cluster size to 0, then to 2 in order to recreate everything. I added a forwarding rule. Both exec curl and external access to echoheader are OK now.

About the README :

GLBC automatically adds a forwarding rule and Nginx RC doesn't. Is that right ?

bprashanth commented 8 years ago

Should we add something about the required forwarding rule ?

I don't understand why you need the forwarding rule? nginx is a pod, it doesn't even understand cloud providers. Are you trying to run the nginx controller in isolation or GLBC -> nginx -> origin server? Maybe you meant firewall rule ? Nginx does need a fiewall rule, just like any other pod or process running on a vm in your kubernetes cluster.

Should we add something about the ingress class annotation ?

It should be in the README's of the relevant ingress controllers. We could also add it to some top level doc, but there are so many ingress controllers out there that don't implement it.

GLBC automatically adds a forwarding rule and Nginx RC doesn't. Is that right ?

It would be great if we made nginx smart about the cloud provider it's running on and autocreate the firewall rule. The challenge there is that: 1. So many cloudproviders 2. you don't always want the firewall rule, espeically if you're going GLBC -> nginx (which we would solve via boolean annotation).

We should surface the need for the firewall rule in the nginx controller docs if we don't already, I thoughte we did somewhere.

We have an e2e that uses the annotation and it passes continuously, maybe you can diff your setup with that one? it uses this rc: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/testing-manifests/ingress/nginx/rc.yaml, and only augments it with the annotation like: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/ingress_utils.go#L569

pdoreau commented 8 years ago

Sure I meant firewall rule. I tried to run the basic use case described in the nginx RC README, without GLBC.

Adding something about the annotation in the nginx RC README would avoid people using both GLBC and nginx RC (just what happened to me). Wouldn't it ?

We should surface the need for the firewall rule in the nginx controller docs if we don't already, I thoughte we did somewhere.

I think so

We have an e2e that uses the annotation and it passes continuously, maybe you can diff your setup with that one?

I deleted ingress + nginx RC and recreated with the annotation. Everything's ok now.

bprashanth commented 8 years ago

Please open a pr if you have time, the suggested clarifications make sense.

The easiest way to detect that the pod is runnig on GCP is by performing a nslookup metadata.google.internal, but I think there's a better way to get this working cross cloudprovider:

  1. Nginx controller creats a service.type=lb for itself using kubernetes client. This will automatically create the right cloud resources on all supported cloudproivders (azure, gce, aws).
  2. Then the controller takes the public ip of this service and puts it in the ingress ip field. \

We could also teach the Nginx controller to autodetect CP itself, and create just the firewall rule. This feels more fickle and less useful IMO.

pdoreau commented 8 years ago

I created #1672 for clarifications. About Service type LB, is the purpose only to create a firewall rule automatically ? This may be an option because it would add an extra cost for each ingress.

bprashanth commented 8 years ago

actually the point is to leverage the cloudprovider detection logic already in the master to do the right thing cross platform, but I agree with the additional cost. I think the tradeoff here is document how to do the easy cheap thing by hand (create firewall rule) and make the more useful cross platform abstraction automatic (service.type=lb). The second is going to be useful in production deployments anyway.

pdoreau commented 8 years ago

Do you mean that the service.type LB is a more relialable and/or efficient way ?

It could also solve an associated problem I'm currently facing. I defined multiple environments through namespaces and used the --watch-namespace flag to get better isolation beetween them. However, I have only 3 nodes in my cluster and beyond 3 nginx RCs, the pod creation fails :

pod (nginx-ingress-controller-pvu9w) failed to fit in any node 
fit failure on node (...): PodFitsHostPorts 
fit failure on node (...): PodFitsHostPorts 
fit failure on node (...): PodFitsHostPorts

What can I do ? Can the creation of a service.type LB resolve this issue ?

bprashanth commented 8 years ago

Do you mean that the service.type LB is a more relialable and/or efficient way ?

No i was talking about using both a service.type LB AND an ingress controller in a pipeline.

What can I do ? Can the creation of a service.type LB resolve this issue ?

You can only run one of them per node, because there's only 1 port 80/443. You can run multiple if you don't mind that the controller listens on some other port (you need to create another RC with hostPort set to something like 8080). You can run any number of service.type=lb because that's provisioned by a cloudprovider.

pdoreau commented 8 years ago

No i was talking about using both a service.type LB AND an ingress controller in a pipeline.

Ok, but I'm not sure about the idea : why would it be usefull for in production deployments ?

bprashanth commented 8 years ago

Becaues you get a lot for free just publishing a public ip from a cloudprovider (basic ddos, regional loadbalancing becaues they have global pops)

bprashanth commented 8 years ago

at the same time the CP lb is less felxible (no redirects, adding a new svc takes 10m)

pdoreau commented 8 years ago

I see, good things indeed. With that configuration, hostPort for niginx rc would not be necessary anymore (service LB/NodePort used)?

I also noticed replicas count is 1 in nginx rc config. For high availabilty, would it be usefull to increase this value (avoid single point of failure) ?

frekw commented 8 years ago

@pdoreau I'm also trying to get the example to work on GKE and just wanted to clarify how you exposed the ingress controller and opened up the firewall?

I exposed it as a NodePort service and then added a firewall-rule as per this:

source: 0.0.0.0
target tag: gke-my-pool-[randomhash]-node
port: tcp:30xxx

where 30xxx is the port the NodePort actually maps to.

was that how you went about exposing the ingress controller to get everything to work, or is there anything else I need to open up in the firewall?

I've also tried to opening tcp:80 for the same target tag (since the RC binds to the hostPort: 80) without any success.

frekw commented 8 years ago

Nevermind, I got it to work by exposing tcp:80 for the cluster. Just took 30 minutes or so for the changes to propagate.

bprashanth commented 8 years ago

both hostPort 80 and nodePort (as long as you actually create a NodePort Service) should work. 30m sounds like too long, when the firewall-rules create call completes, it should be open. I assume you're just running the raw nginx controller vs nginx controller behind gce ingress controller, in the latter case you will have delay of ~15m till health checks pass.

fejta-bot commented 6 years ago

Issues go stale after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle rotten /remove-lifecycle stale

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close