spring-cloud / spring-cloud-kubernetes

Kubernetes integration with Spring Cloud Discovery Client, Configuration, etc...
Apache License 2.0
3.45k stars 1.03k forks source link

Allow KubernetesServiceInstance to optionally use hostname rather than IP Address. #278

Closed dgrandemange closed 3 years ago

dgrandemange commented 5 years ago

In org.springframework.cloud.kubernetes.ribbon.KubernetesServerList, when populating Server list, is there a particular reason to create Server instances using EndpointAddress ip address, instead of EndpointAddress hostname ?

Why such question : Well, in an OpenShift cluster, let's say I have two OpenShift services :

Each service is load-balancing to 1 or more of its backing pods. Service A is the ingress service, and is exposed to the outside world via a dedicated route. Service A internally communicates with service B (through OpenFeign/Ribbon client along Hystrix). Communication between the outside world and service A can be secured through a TLS termination configured at OpenShift service level, but it's not the point here.

Now, let's say, i would like to provide a secure TLS communication between Service A and Service B (i am talking about cluster internal communication here). To do so (and without speaking about solutions like Istio or other Service Mesh), i have updated Service B Spring Boot app config so that the REST service in service B is now served through TLS on a 8443 port, and presents a server certificate signed by some CA. Now, when my Spring Boot webapp in service A needs to communicate with the now secured REST service in service B, a SSL handshake occurs in between. During this handshake, standard checks occur, namely server certificate name validation : the server certificate name (CN) should match the hostname used to connect to Pod B. That's where i get into trouble : in order for my webapp (Pod/Service A) to check certificate received from REST service (Pod/Service B), webapp should first have connected to Pod B using a FQDN, not by its internal IP. I mean : in Openshift cluster, Pod's IP address are dynamically allocated. Certificated CN validation can not be done against dynamic IP. In my Certificate Signing Request, i can specify a CN like '*.<namespace>.endpoints.cluster.local' so that any hostname matching '<namespace>.endpoints.cluster.local' domain will be validated at handshake. But, in my case, it would require the Kubernetes Ribbon to populate its server list with hostname, not ip adress.

Having server list populated with ip address ...

    11:48:09,398 INFO  [com.netflix.loadbalancer.DynamicServerListLoadBalancer] (hystrix-tp-minishift-1) DynamicServerListLoadBalancer for client tp-minishift initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=tp-minishift,current list of Servers=[172.17.0.9:8443],Load balancer stats=Zone stats: {unknown=[Zone:unknown;   Instance count:1;   Active connections count: 0;    Circuit breaker tripped count: 0;   Active connections per server: 0.0;]},Server stats: [[Server:172.17.0.9:8443;   Zone:UNKNOWN;   Total Requests:0;   Successive connection failure:0;    Total blackout seconds:0;   Last connection made:Thu Jan 01 00:00:00 UTC 1970;  First connection made: Thu Jan 01 00:00:00 UTC 1970;    Active Connections:0;   total failure count in last (1000) msecs:0; average resp time:0.0;  90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;  max resp time:0.0;  stddev resp time:0.0]

SSL handshake error below occurs :

    17:19:33,015 ERROR [org.springframework.boot.web.servlet.support.ErrorPageFilter] (default task-1) Forwarding to error page from request [/get-podinfos] due to exception [TpMinishiftServiceProxy#podinfos() failed and fallback failed.]: com.netflix.hystrix.exception.HystrixRuntimeException: TpMinishiftServiceProxy#podinfos() failed and fallback failed.
    Caused by: feign.RetryableException: java.security.cert.CertificateException: No subject alternative names matching IP address 172.17.0.9 found executing GET http://tp-minishift/podinfos 
    Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative names matching IP address 172.17.0.9 found
    Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 172.17.0.9 found
        at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
        at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
spencergibb commented 5 years ago

It wouldn't be hard to change that and/or make it an option.

ping @iocanel @salaboy and @geoand

dgrandemange commented 5 years ago

Ok, Depending on the time i can put on this, i 'll consider give a try, and come back to you.

salaboy commented 5 years ago

@dgrandemange you can do the change.. but in reality we are using the serviceId in order to do requests between services... In my demo, I've used a configuration for ribbon to not load balance from the client side.. so instead of using the prefix: "lb://" I use "https://". The Kubernetes service will do the load balance for you. You can still use the fallbacks.. but client side load balancing will be handled by Kubernetes

salaboy commented 5 years ago

Link to the example: https://github.com/salaboy/s1p_concerts-service/blob/master/src/main/java/org/sp1/demo/concerts/service/services/ConcertServiceImpl.java#L109

and

https://github.com/salaboy/spring-cloud-k8s-minion/blob/develop/src/main/java/org/minions/demo/BossClientService.java#L37

dgrandemange commented 5 years ago

Well, i am already using serviceId and therefore rely on OpenShift service to handle the load-balancing thing.

Consider REST service interface below :

public interface TpMinishiftService {

    @RequestMapping("/podinfos")
    PodInfo podinfos();

}

My service B provides an implementation of this service, whereas service A client uses FeignClient below to invoke it, specifying a serviceId, which is here specified using the FeignClient.name annotation attribute (using a property placeholder here) :

@FeignClient(name = "${service.tp-minishift.name}", fallback = TpMinishiftServiceProxyFallback.class)
public interface TpMinishiftServiceProxy extends TpMinishiftService {

    @Component
    public static class TpMinishiftServiceProxyFallback implements TpMinishiftServiceProxy {

        private static final Logger LOGGER = LoggerFactory.getLogger(TpMinishiftServiceProxyFallback.class);

        @Override
        public PodInfo podinfos() {
            throw new UnsupportedOperationException("Invalid fallback method execution");
        }

        public PodInfo podinfos(Throwable t) {
            LOGGER.warn(t.getMessage(), t);
            return new PodInfo("<unavailable>");            
        }
    }
}

FeignClient is then autowired into/invoked from Service A controller, like this :

@Controller
public class MainController {

    @Autowired
    private TpMinishiftServiceProxy tpMinishiftCli;

    @RequestMapping("/get-podinfos")
    public String getPodinfos(Model model) {
        PodInfo podinfos = tpMinishiftCli.podinfos();
        model.addAttribute("podinfos", podinfos);

        // ...

        return "podinfos-view";
    }

}
salaboy commented 5 years ago

@dgrandemange I faced similar problem and it all boils down to

@FeignClient(name = "${service.tp-minishift.name}" ..)

If you do @FeignClient(name = "http://${service.tp-minishift.name}" ... ) does that work? If not.. we need to look into how the Feign client is created with Ribbon.

geoand commented 5 years ago

I see no reason why we can't provide a configuration option so that KubernetesServiceInstance will return the hostname instead of the ip when getHost is called

dgrandemange commented 5 years ago

@salaboy FYI : prefixing serviceId with http:// hasn't fix it

dgrandemange commented 5 years ago

Just for the record, performing a test with host verification disabled (following this recipee ), SSL handshake succeeds, and service A - service B inter-communication is ok.

dgrandemange commented 5 years ago

Ok, I have deployed my service A with a patched version of org.springframework.cloud.kubernetes.ribbon.KubernetesServerList, replacing EndPoint.getIp() by EndPoint.getHostname() when creating Serverinstances. After invoking exposed service A, when service A looks for service B, logs show a servers list with null for servers hostname. Actually, according to Openshift documentation, endpoints in OpenShift are only described by IP address, not by hostname.

$ oc describe ep/tp-minishift

Name:         tp-minishift
Namespace:    poc2
Labels:       app=tp-minishift
Annotations:  <none>
Subsets:
  Addresses:          172.17.0.4,172.17.0.5
  NotReadyAddresses:  <none>
  Ports:
    Name      Port  Protocol
    ----      ----  --------
    8443-tcp  8443  TCP

Events:  <none>

So, this is not the way to go.

Now looking at @geoand comment, i realize i should have try to patch org.springframework.cloud.kubernetes.discovery.KubernetesServiceInstance instead.

geoand commented 5 years ago

@dgrandemange please let us know how it goes if you make a test with a patched KubernetesServiceInstance

salaboy commented 5 years ago

@geoand I am not sure that we should do that (replace IP for host name completely).. if we do that then we lost the IP information of the service at registration time.

geoand commented 5 years ago

@salaboy I'm not sure I follow :)

dgrandemange commented 5 years ago

Just for you to know : i am trying another approach.

In my FeignClient, instead of specifying a serviceId, i will try to specify the url to the targeted OpenShift service.

Actually, in Openshift, a service is also adressable using its internal hostname as assigned by OpenShift internal DNS. For instance, my service tp-minishift in namespace/project poc2 has been assigned the name tp-minishift.poc2.svc.cluster.local and should be joignable by any other service through this name.

So, in my client service, i should be able to configure Feign client like this : @FeignClient(url = "https://tp-minishift.poc2.svc.cluster.local") and should still benefit service native load-balancing between service's backing pods. Cool thing is, using here a hostname (tp-minishift.poc2.svc.cluster.local) should now make the handshake server name validation possible.

salaboy commented 5 years ago

@dgrandemange I found my configuration that looked like this to avoid Ribbon Client Side LoadBalancing

spring:
  application.name: gateway
  cloud:
    gateway:
      discovery:
        locator:
          enabled: true
          url-expression: "'http://'+serviceId"
      x-forwarded:
        prefixEnabled: true
    kubernetes:
      reload:
        enabled: true
        mode: polling
        period: 5000
geoand commented 5 years ago

@salaboy What I am referring to is to have the option to use the hostname (of the pod) instead of the ip (again of the pod) in KubernetesServiceInstance. That way, if the pod hostnames match the certificates, then https connections should be possible without trouble. In this case Ribbon will use proper client side load balancing instead of relying on the Kubernetes Service abstraction to do that

salaboy commented 5 years ago

@geoand thanks for the clarification.. that makes more sense, but it still a hack that might lead to problems in the future. The more I think about the problem, I think that to be on the safe side, we should just add a single Endpoint with the Kubernetes Service name. If you need certificates we should recommend Istio, which will also have problems if ribbon try to establish connections with pod host names

geoand commented 5 years ago

@salaboy actually I think we could add that as an option, but I don't think it should be the default. It should be up to the user to choose whether or not they want real client side load balancing (in the sense that it's done by the library of the application), or pseudo client side load balancing which is what would happen if we simply use the (single) service name.

salaboy commented 5 years ago

@geoand so maybe what we are talking here is to extend the Endpoint object to contain more information? In any case, I do agree with allowing people to set it up. So it sounds like a good first step to make the configuration of the Endpoints information configurable.

geoand commented 5 years ago

:+1:

dgrandemange commented 5 years ago

I didn't mention in the first place, but when i choosed to use service URL in FeignClient, it indeed meant giving up on Ribbon load-balancing capacity (app level configurable), and only rely on OpenShift's service load-balancing (OpenShift service level).

@geoand @salaboy so, yes, i think you evntually manage to get to the real point here, thank you

germm commented 5 years ago

We are facing the same issue with TLS Hostname verfication as described in the first post.

It would be great if spring-cloud-kubernetes-discovery could be (optionally) configured that KubernetesServerList and KubernetesServiceInstance us the Pod Hostname assigned by Kubernetes:

<pod-ip-dashed>.<namespace>.pod.cluster.local

See also DNS for Services and Pods:

We could sucessfully test the TLS Hostname verification with a TLS Certificate issued to *.<namespace>.pod.cluster.local and a patched KubernetesServerList which calulates the Hostname with this code :

address.getIp().replace('.', '-') + "." + namespace + ".pod.cluster.local";

The behaviour should be configureable with a property and the default should still use the IP Adress (similar to preferIpAddress in Eureka).

mwmitchell commented 4 years ago

I work with a service that uses a flavor of Curator leader election, which works by associating (and persisting) job IDs to server IPs. We use a StatefulSet to ensure local state (persistent volume) is always associated with the same Pod. This works fine, except that Pod IPs change, and when this happens incoming requests to jobs (getStatus etc.) fail. If the DiscoveryClient could instead use the full hostname of a Pod, then the problem would be solved. Happy to submit a PR for this.

rafaelrenanpacheco commented 4 years ago

Before using Spring Cloud Kubernetes we were using Eureka + Netflix Ribbon for service discovery and client side load balance. The worst part on that stack was the ~30 sec delay for the ribbon to update it's reference for new or removed pods in our cluster. For example, when a pod is updated and gets to the ready status, Kubernetes automatically starts to terminate the old pod. In this situation our microservices will try to call the old pod for around 30 seconds before knowing there's a new pod in town, due the Ribbon + Eureka refresh delay.

When we changed to Spring Cloud Kubernetes (at the time with kubernetes ribbon support), we were able to achieve zero downtime during pod updates, because we set spring.cloud.kubernetes.ribbon.mode to SERVICE, which allowed us to use Kubernetes native service discovery.

After we updated to Hoxton.SR5 we lost Kubernetes native service discovery, because as reported by this thread, KubernetesDiscoveryClient works with endpoints (pods IPs), and not with FQDN anymore. The documentation explain how to use FQDN, but I don't think that will work without service mesh and/or traditional Feign clients with service id.

In order to test how fast KubernetesDiscoveryClient was, we scaled a pod from 1 to 2, and pod 2 started to receive requests after a little while (~5 seconds) after it were ready. This is acceptable in our application because there's no downtime here. The problem occurred when we scaled back from 2 to 1. When pod 2 was terminated, 1 of our requests took around 6 seconds before failing because the pod was unreachable. This wouldn't happen if Feign used the FQDN to send requests to that pod.

@spencergibb is there a way to use Spring Cloud Kubernetes with FQDN using traditional Feign clients with serviceId, and without relying on istio or other service meshes?

Best regards, Rafael Pacheco.

spencergibb commented 4 years ago

I believe @OlgaMaciaszek is working on spring cloud loadbalancer support which should have a similar mode to the ribbon integration.

AndreasKl commented 4 years ago

If you don't have an issue with building a custom RouteDefinitionLocator (which could be more efficient) and need a solution now your could use something like this:

https://gist.github.com/AndreasKl/7d18ce6807c366af6c97c62d126170e1

our config is:

spring.cloud.gateway.discovery.locator.url-expression: "'http://'+serviceId+':'+port"

as our public services are in the default namespace we can use this simple expression. We also disabled (and rolled our own) KubernetesCatalogWatch as this was not sufficient for our use case.

lucasoares commented 4 years ago

If you don't have an issue with building a custom RouteDefinitionLocator (which could be more efficient) and need a solution now your could use something like this:

https://gist.github.com/AndreasKl/7d18ce6807c366af6c97c62d126170e1

our config is:

spring.cloud.gateway.discovery.locator.url-expression: "'http://'+serviceId+':'+port"

as our public services are in the default namespace we can use this simple expression. We also disabled (and rolled our own) KubernetesCatalogWatch as this was not sufficient for our use case.

I'd liked your solution and it could work for me too. But in my use case I have one more variable: the version for each service.

Each application can have multiple versions on the same environment. Each version has its own kubernetes service with a different name.

Example:

Environment: staging Application: auth

The auth application on staging environment will have a service named auth-staging.

If I'm developing some new version of this application and don't want to change the staging state I will deploy it with a specific name, for example auth-itsm-323 (which is the jira issue for this feature). It helps our team to control the environment and to test new features individually before sending it to staging as the default version.

Today I use consul for service discovery where each pod register itself with a label specifying its kubernetes service FQDN (we are not using kubernetes service discovery right now).

With your solution I can move my service discovery to kubernetes to automatically discover the FQDN, removing the need of a label on each pod and throw consul away. I will also need to create new labels because I have many namespaces and it should be used in the FQDN. It would work, nice!

The problem is with the feign clients. How to enable the same FeignClient to use different serviceid for the same microservice? I can dynamically configure the serviceId based on a spring property called for example foo.bar.auth.version=itsm-323?

Sorry if I'm missunderstanding something here. I'm trying different solutions for my use case because I'm using the discovery service only in the gateway. All internal communication between the microservices (using feign) are based on the kubernetes service, which I declare manually using a spring property in our config server.

Example:

FeignClient

@FeignClient(
    contextId = "ConnectionApi",
    name = "connection-service",
    url = "${foo.bar.connection-service-host:\"\"}")
public interface ConnectionApi {
}

ConfigServer:

{
    "foo.bar.connection-service-host" : "http://connection-service-staging.namespace.svc:PORT",
}

If I need a specific application to use a different version of the connection-service, I just change it on its configuration. As we scale on number of services (today we already have more than 30 services) this is manually exhausting and doesn't have sense because tecnically we have the service discovery to do the job for us.

What can I do to use FeignClients using the FQDN and have dynamic configurations?

Thank you.

ryanjbaxter commented 4 years ago

There appears to have been a mixup, Ribbon should have never been removed from the Hoxton Release Train. I will add it back and we will get a release out ASAP. Sorry for the mixup!

OlgaMaciaszek commented 3 years ago

Both Ribbon (up to Hoxton) and SC LoadBalancer are integrated with SC Kubernetes now. Closing the issue.