Support EndpointSlice addressType "FQDN"

ChristianAnke commented 1 year ago

When configuring an EndpointSlice with addressType "FQDN" it will be correctly configured. https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/

Currently the following configuration is accepted, but not working when accessing the Ingress endpoint:

apiVersion: v1
kind: Service
metadata:
  name: reverse-proxy
spec:
  ports:
    - name: https
      port: 443
      targetPort: 443
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: reverse-proxy-1
  labels:
    kubernetes.io/service-name: reverse-proxy
addressType: FQDN
ports:
  - name: https
    appProtocol: https
    protocol: TCP
    port: 443
endpoints:
  - addresses:
      - "others.org"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  name: reverse-proxy
spec:
  rules:
    - host: myself.org
      http:
        paths:
          - backend:
              service:
                name: reverse-proxy
                port:
                  number: 443
            pathType: Prefix
            path: /foo

Error when accessing URL:

[lua] balancer.lua:348: balance(): error while setting current upstream peer [others.org]:443: invalid IPv6 address while connecting to upstream, client: xxx.xxx.xxx.xxx, server: myself.org, request: "GET /foo HTTP/2.0", host: "myself.org"

Requires Kubernetes Version: v1.21

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 1 year ago

@ChristianAnke , i did these steps ;

kubectl create deployment test0 --image nginx:alpine
kubectl expose deployment test0 --port 80
kubectl create ingress test0 --class nginx --rule test0.mydomain.com/"*"=test0:80
curl --resolve test0.mydomain.com:80: http://test0.mydomain.com

I was able to get a response code of 200.

So can you write your own instructions based on the above commands and the manifest produced by above kubectl commands, using the flag --dry-run=client . Edit the manifests as required. And also provide the appropriate curl command that does not get a 200 response. Then add the output of commands like kubectl logs $controllerpod

Then copy/paste the entire commands instructions or manifests for all related objects, so someone can reproduce the problem you are reporting.

Next, the new issue template asks questions so that there is data available to analyse the reported problem. You have not answered any questions. There is no info even on the controller version etc. So edit your issue description and kindly answer the questions asked in a new issue template. Please do format the information in markdown and code-snippets

ChristianAnke commented 1 year ago

@longwuyuan , thanks for the answer.

on purpose i provided a kubernetes manifest which is reflecting what is required to reproduce the issue. I do not understand why you come up with a completely different setup than the one i provided.

Furthermore did i fill out template with the asked questions. I just removed the un-commented things because i have no idea how this was meant to be used since nothing of the template was visible in the preview mode. The template is this:

<!-- What do you want to happen? -->

<!-- Is there currently another issue associated with this? -->

<!-- Does it require a particular kubernetes version? -->

<!-- If this is actually about documentation, uncomment the following block -->

<!-- 
/kind documentation
/remove-kind feature
-->

longwuyuan commented 1 year ago

Understood.

the goal is for a reader to be able to reproduce the problem
the template for a new issue asks questions that are relevant for a reader completely unaware of environment or context
your manifest is helpful on 3 objects
your manifest assumes multiple objects ranging from the controller version to the app deployment etc
lesser the assumptions, better chances to reproduce the problem
any chance you can write a manifest that someone can just copy/paste in a cluster created using kind or minikube ? Providing the curl request along with manifest for creating deployment, service, ingress will help

tombokombo commented 1 year ago

Hi @ChristianAnke why would you play with low-level endpoint slices api, have you tried service externalName https://kubernetes.io/docs/concepts/services-networking/service/#externalname ? It shoul work see https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/endpointslices.go#L55

longwuyuan commented 1 year ago

/triage needs-information

github-actions[bot] commented 11 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

BulatSaif commented 10 months ago

I am verifying that issue is still present (kubernetes-version=1.27.4). I run the configuration form the description, here is full log output:

curl myself.org/foo
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

ingress logs:

2023/09/06 19:49:55 [error] 1159#1159: *924559 [lua] balancer.lua:348: balance(): error while setting current upstream peer [others.org]:443: invalid IPv6 address while connecting to upstream, client: 10.244.0.1, server: myself.org, request: "GET /foo HTTP/1.1", host: "myself.org"

2023/09/06 19:50:00 [error] 1159#1159: *924559 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.0.1, server: myself.org, request: "GET /foo HTTP/1.1", upstream: "https://0.0.0.1:80/foo", host: "myself.org"
10.244.0.1 - - [06/Sep/2023:19:50:00 +0000] "GET /foo HTTP/1.1" 504 160 "-" "curl/7.68.0" 77 5.000 [namespace-reverse-proxy-443] [] 0.0.0.1:80 0 5.000 504 9703016b67d175428a4f90615eee684f

line upstream: "https://0.0.0.1:80/foo" looks wrong, it should be https://[IP of others.org]:443/foo

Ghilteras commented 4 months ago

@tombokombo you can't use ExternalName service when exposing TCP traffic, you must use EndpointSlice and for some reason the FQDN is treated as IPv6

@longwuyuan is there any chance this can be fixed?

[error] 38#38: *2744 stream [lua] tcp_udp_balancer.lua:196: balance(): error while setting current upstream peer [my.foo.fqdn.com]:6379: invalid IPv6 address while connecting to upstream, client: 100.109.183.192, server: 0.0.0.0:6379, bytes from/to client:0/0, bytes from/to upstream:0/0

longwuyuan commented 4 months ago

I don't understand the tiny details, that the question would imply, fully well. But if you are asking if a endpointslice can be created manually for the purpose of the controller picking it up in lieu of the function to get endpointslices, as a feature, then its not likely in the near future.

It will also help to know what, in layman terms, is the bigger picture problem, that is blocking use of ingress-nginx controller functions, and that will get fixed if you create a endpointslice and make the controller use that for routing ? Hoping for some elaboration on the reference to a "TCP service" and address-type "FQDN" etc. Kindly elaborate on the end goal .

rteng1 commented 4 months ago

@longwuyuan

I'm working with @Ghilteras on this

"TCP service" is as per https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/, in this case is example-go service. In our case, it's my-proxy service as code refs below.

Because we want to use the service as a proxy to a FQDN, we created a k8s service that has no selectors, and an endpointSlice of type FQDN that maps to the service (which hopefully creates the endpoints for the "tcp" service). But we are getting invalid IPv6 address while connecting to upstream error which suggests the endpoint slice is not creating the endpoints correctly because of the FQDN address type

Code refs:

kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  6379: "default/my-proxy:6379"
---
apiVersion: v1
kind: Service
metadata:
  name: my-proxy
spec:
  ports:
    - name: tcp
      port: 6379
      targetPort: 6379
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: my-proxy
  labels:
    kubernetes.io/service-name: my-proxy
    kubernetes.io/managed-by: manual
addressType: FQDN
ports:
  - name: redis
    appProtocol: tcp
    protocol: TCP
    port: 6379
endpoints:
  - addresses:
      - clustercfg.xxxxx.cache.amazonaws.com

longwuyuan commented 4 months ago

Looks like you want host a proxy inside the cluster, listening at port 6379, and you are expecting a connection to this LB:6379 should in-turn connect to a AWS Redis Instance.

What is this proxy software (haproxy/customsoftware) ?
Are you already aware that the tcp/udp feature i not a upstream Kubernetes spec but a feature that this project implements, just so that users can send TCP/UDP traffic instead of just HTTP/HTTPS/GRPC ? So its not a ingress API routing.
The controller switched to using endpointslices for the ingress objects related routing. In my opinion that same codepath is not traversed for tcp/udp traffic routing. I could be wrong. But did you base this design of a endpointslice for a K8S object of --type service, based on that. In any case, I think a new feature , of that nature, if so, will not get worked on because the implementation of tcp/udp traffic routing is expected to change design.
I think people look for dedicated redis-proxy just like people write postgres/myql proxies. So that categorically mysql/postgres targeted proxy has the destination configured inside it and hence establishes a connection of its own, using its own lookup. It seems you are attempting to provide a destination for TCP traffic outbound from this pod called "my-proxy", by creating a K8S object like a endpointslice, and thus helping the pod avoid some name resolution tasks. There are so many implications of this. Why would you choose to expose a TCP socket on the LB of a K8S cluster only to reach AWS redis instance on the internet ? The use case is unclear

rteng1 commented 4 months ago

@longwuyuan I think there might be a misunderstanding so let me address your questions in a different order:

I think people look for dedicated redis-proxy just like people write postgres/myql proxies. So that categorically mysql/postgres targeted proxy has the destination configured inside it and hence establishes a connection of its own, using its own lookup. It seems you are attempting to provide a destination for TCP traffic outbound from this pod called "my-proxy", by creating a K8S object like a endpointslice, and thus helping the pod avoid some name resolution tasks. There are so many implications of this. Why would you choose to expose a TCP socket on the LB of a K8S cluster only to reach AWS redis instance on the internet ? The use case is unclear

What is this proxy software (haproxy/customsoftware) ?

The redis (elasticache) is only reachable inside the k8s cluster VPC (our elasticache shares the same VPC as the k8s cluster), we're trying to EXPOSE the elasticache for access outside the VPC

As mentioned, the proxy is just a normal k8s service (my-proxy), the expose is through another load balancer service as https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
    - name: https
      port: 443
      targetPort: 443
      protocol: TCP
    - name: redis
      port: 6379
      targetPort: 6379
      protocol: TCP
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

then proxying through the tcp services map via args to run the nginx controller

 args:
    - /nginx-ingress-controller
    - --tcp-services-configmap=ingress-nginx/tcp-services

and we make the NLB public ("external: false") so that redis is reachable from outside the VPC

Are you already aware that the tcp/udp feature i not a upstream Kubernetes spec but a feature that this project implements, just so that users can send TCP/UDP traffic instead of just HTTP/HTTPS/GRPC ? So its not a ingress API routing.

The controller switched to using endpointslices for the ingress objects related routing. In my opinion that same codepath is not traversed for tcp/udp traffic routing. I could be wrong. But did you base this design of a endpointslice for a K8S object of --type service, based on that. In any case, I think a new feature , of that nature, if so, will not get worked on because the implementation of tcp/udp traffic routing is expected to change design.

That's interesting - we feared that might be the case, but are you saying that basically "ingress for tcp traffic to k8s cluster" will not ever be supported because the k8s spec says so?

longwuyuan commented 4 months ago

Ah, thanks. So you have a redis elasticache and you want public internet clients to make calls to it. So instead of a native AWS conduit, you are trying to setup a tcp proxy in K8s. That would explain the effort.
About ingress for tcp traffic support, its actually not so black & white.
You actually want to create a endpointslice (not even just a endpoint) for your goal. Stating the obvious here for thoughts on slice being a collection of many and in this case many endpoints. Next relevant details here are
- The service configured as target for the opened TCP port, already has at least one endpoint (NOT a endpointslice)
- So there is likely that one endpointslice is already existing. If yes, then it has a field like endpointslices.endpoints.addresses
- You are expecting that the controller drop all this info, and somehow pick a new endpointslice (not even a endpoint) , that a user created, and that also has a field like endpointslice.endpoints.addresses. This kind of functionality is not likely to get implemented in near future.
- I think you have the choice to explore a redis proxy https://duckduckgo.com/t=ffab&q=redis+proxy&atb=v390-1&ia=web
- You can leverage the K8S object of kind service, --type externName, in case it fits the design of usng a redis proxy

longwuyuan commented 4 months ago

My comments may not be on similar opinions as others so please wait and see if others comment on this

Ghilteras commented 3 months ago

@longwuyuan please see inline below the comments

You actually want to create a endpointslice (not even just a endpoint) for your goal. Stating the obvious here for thoughts on slice being a collection of many and in this case many endpoints. Next relevant details here are

The issue is that the ingress controller does not recognize EndpointSlice of type FQDN

The service configured as target for the opened TCP port, already has at least one endpoint (NOT a endpointslice)

Since Endpoints are deprecated, we have just created an EndpointSlice

You are expecting that the controller drop all this info, and somehow pick a new endpointslice (not even a endpoint) , that a user created, and that also has a field like endpointslice.endpoints.addresses. This kind of functionality is not likely to get implemented in near future.

Not really. We would expect the Service to pick up the EndpointSlice as per k8s documentation, which works fine for EndpointSlice of type IPv4, but for EndpointSlice of type FQDN the controller thinks it's an IPv6. This looks like a bug, not a feature request. Shouldn't we change the kind to reflect that?

I think you have the choice to explore a redis proxy https://duckduckgo.com/t=ffab&q=redis+proxy&atb=v390-1&ia=web

That's what we are doing with haproxy to circumvent the fqdn/ipv6 EndpointSlice bug

You can leverage the K8S object of kind service, --type externName, in case it fits the design of usng a redis proxy

I don't think you can't tie a Service to another Service though. This could work if we could use an Ingress, but we can't. That's why we are hooking the Service with the EndpointSlice

longwuyuan commented 3 months ago

@Ghilteras thanks for the update. it helped

This discussion has now requires a perusal of the code and I am not a developer to make comments on code
But I am aware that we switched to using the endpointslice api for the ingress routing
I am also aware that the TCP/UDP ingress feature is not really using the ingress API a ingress is not for layer4 TCP/UDP
I am given to understand that the project implemented TCP/UDP ingestion and it likely use both go & lua code to implement a proxy
So the indication is that dev work is required around the tcp/udp proxy to use endpointlice of type fqdn
AFAIK, this is not likely to happen in near future because there are plans to change the implementation of the tcp/udp proxy. But don't take my word for it and lets see if there are comment around this
If there are possibilities of a PR from you, I think it will get reviewed for impact

On the Redis-Proxy part, my thoughts were that I found some hits on searching like https://artifacthub.io/packages/search?ts_query_web=redis+proxy&sort=relevance&page=1

If these redis-proxies ingest your AWS-rRedis-Elasticcache hostname as a destination, then you can deploy them and expose using tcp/udp port feature

On a complete tangent, if I were to implement this, I would have the frontend consume a configurable ENV-VAR for the AWS-Redis-ElasticCache FQDN, instead of redis queries first coming to a K8S-Cluster and then getting bounced off to AWS. The efficiency & security of K8S as target of redis queries but ultimatety destined for a AWS-ElasticCache, would only be compromised if there was some really unpleasing design aspect, forcing you to do this.

But these are my opinions. It is clear that a developer needs to comment here. There is really acute shortage of developer time so the choices are to join the community meeting https://github.com/kubernetes/community/tree/master/sig-network (and of course wait here for comments from community experts and developers)

Ghilteras commented 3 months ago

We already use a tcp proxy (haproxy) in the meantime while we wait that EndpointSlices of type FQDN are not mistaken as IPv6s so we do have a workaround. But I still do not understand why this issue is tagged as Feature, because the fact that NGINX interprets the FQDN as IPv6 seems to be a bug. I might be missing context here obviously, but why are we talking about changing the implementation of the tcp/proxy? How does fixing this bug requires changing the implementation? I am just genuinely curious here

longwuyuan commented 3 months ago

@Ghilteras sorry for not being clear enough.

The tcp/udp port exposing feature is not a upstream K8S spec AFAIK. Its a feature that this project implemented.
In that context there are plans to change how tcp/udp port expose feature works
So I assume its less likely that there are resources like developer-time available to make the endpointslices of type FQDN work with this feature, in the short term
The critical high priority problems being worked on are just too many and too complicated to make this issue a higher priority
I could be wrong so we need to wait for others to comments
Please do join the next community meeting and bring this up as developers and maintainers are there live
You have the option to set the bug label on the issue if you want to
Its just that there is not enough smoking gun data like
- k describe of controller resources (pod, svc, configMap)
- k descrbe of app resources pod,svc exposing tcp port
- k describe of a endpointslice in use for the expected behaviour
- Complete and real curl command as used
- Complete and real k logs output of the controller pod, including logs of the curl command if any
- Log messages in the controller pod logs or other info that shows your endpointslice got used and error was FQDN getting evaluated a IPv6
- Complete output of k get events
When there is data that shows a bug, it becomes easy to apply labels like "triage accepted" and "bug" to a issue as it reduces the effort needed by a developer
The e2e tests include tests to reach a tcp/udp port of a pod inside the cluster. There are no tests to send traffic to a FQDN obtained from a user created endpointslice so its going to be a NON-TRIVIAL effort for a developer
In case you want to submit a PR, I am certain that there will review comments coming on it

Hope you have more info now.

Ghilteras commented 3 months ago

The previous post contains enough data already supplied from @BulatSaif and @ChristianAnke. If you guys require additional information please let us know.

The tcp/udp port exposing feature is not a upstream K8S spec AFAIK. Its a feature that this project implemented.

I think we are all aware of that, that's why we filed this issue against NGINX Ingress repo and not against k8s

In that context there are plans to change how tcp/udp port expose feature works

Again, we are not asking to change how tcp/udp port expose works, we are asking to fix a bug

So I assume its less likely that there are resources like developer-time available to make the endpointslices of type FQDN work with this feature, in the short term The critical high priority problems being worked on are just too many and too complicated to make this issue a higher priority

IMHO bugs that are easy to fix and this one looks like it should not require a lot of effort can be prioritized without dramatically altering the roadmap of the project.

Ghilteras commented 3 months ago

/remove-kind feature /kind bug

longwuyuan commented 3 months ago

@Ghilteras thanks for your comments. I guess we have to wait for comments from others.

Ghilteras commented 2 months ago

Just circling back to this to check whether someone can accept the triage and remove the needs more information tags

k8s-ci-robot commented 2 months ago

@Ghilteras: The label triage/accepted cannot be applied. Only GitHub organization members can add the label.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10080#issuecomment-2075549315): >/triage accepted Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

antoniolago commented 1 month ago

I'm using k8s v1.29.1 and was using helm chart v4.8.3. While trying to redirect to an external FQDN I encountered the same issues as the OP, but bumping to v4.10.1 apparently solved it.

``` kind: Service apiVersion: v1 metadata: name: nextcloud namespace: nextcloud spec: ports: - name: nextcloud protocol: TCP port: 80 targetPort: 9855 type: ExternalName sessionAffinity: None externalName: mydomain.org --- kind: EndpointSlice apiVersion: discovery.k8s.io/v1 metadata: name: nextcloud namespace: nextcloud labels: kubernetes.io/service-name: "nextcloud" addressType: FQDN ports: - name: nextcloud port: 9855 protocol: TCP endpoints: - addresses: - "mydomain.org" conditions: ready: true --- kind: Ingress apiVersion: networking.k8s.io/v1 metadata: name: ingress-nextcloud namespace: nextcloud annotations: kubernetes.io/ingress.allow-http: "true" acme.cert-manager.io/http01-edit-in-place: "true" cert-manager.io/cluster-issuer: letsencrypt-production spec: tls: - hosts: - myotherdomain.org secretName: mydomain-certificate rules: - host: myotherdomain.org http: paths: - path: / pathType: Prefix backend: service: name: nextcloud port: number: 80 ingressClassName: nginx ```

kubernetes / ingress-nginx

Support EndpointSlice addressType "FQDN" #10080