Possible to provision API with an internal LB ?

kingdonb commented 5 years ago

/kind feature

Is it possible to add configuration to the cluster network, which would permit switching from "internet-facing" scheme, to "internal" for the control plane API? From my reading of the docs this would be a new feature request.

I went to add a field to my cluster.yaml during clusterctl create, but failed to grok that it's just a status field in the spec, not anything you can actually change or set I think.

status:
  apiEndpoints:
  - host: capi-test2-apiserver-xxx.region-1.elb.amazonaws.com
    port: 443
  providerStatus:
    bastion:
      id: ""
    metadata:
      creationTimestamp: null
    network:
      apiServerElb:
        attributes:
          idleTimeout: 600000000000
        dnsName: capi-test2-apiserver-xxx.region-1.elb.amazonaws.com
        name: capi-test2-apiserver
        scheme: internet-facing
        securityGroupIds:
        - sg-12345
...

...so obviously I can't actually set that from the template (or in kubectl edit cluster) what chaos it would be if the cluster API and kube API moved? There are SSL certs in the cluster providence that depend on the hostname of that ELB to remain stable, I guess that's out.

So I'm assuming this would be a strictly cluster-up decision, like most of cluster-api, and not some kind of public-to-private cluster lifecycle event or vice-versa. I don't think my organization will ever want to make Kube API public. I tried taking down the LB manually, but that caused all kinds of havoc with my prometheus metrics and ultimately may have been the cause of death for that cluster.

With a control plane LB in the "internal" scheme mode, we can control access to the API through VPC/VPN and enforce a second factor, and maybe in this way we can safely avoid implementing any more complicated authentication schemes for a while. Presumably I could restrict access to the load balancer with some security group changes?

I am not expert or maybe not permissioned highly enough to figure that out, after several attempts, it seems like ~perhaps some other networking equipment attached to my VPC is preventing it because of the non-standard port 6443. I know we have a Cisco that prevents traffic on non-standard ports from leaving the VPC, and the internet-facing LB is outside of the VPC, so~ I think I really need to use an internal load balancer configuration in order to solve this in my environment.

edit: I ~sort of solved the API security, with security groups.~ Editing the security groups in-place was misguided here ~I needed to be sure that the VPC's NAT GW was added to the security group (not my VPN's NAT GW, which is separate), as well as the ELB itself.~

There is a [CLUSTERNAME]-lb security group created by the default workflow, which has no rules or attachments. I added it to my ELB (guessing that my scenario or something like it is probably what this was meant to be used for) and then put a 6443/TCP rule for it in the security group with another one for my NAT GW, and subnet with the cluster nodes in it, and the bastion host for good measure too.

The schema: internet-facing ELB is now usable in "private mode", although my traffic must leave the VPC which is less than optimal; I can at least work with this, but it is not a garden path either. I understand that customizing security groups is not yet in scope for CAPA, maybe internal LB is a good place to start.

Yikes, I understand now that I should not edit the security groups in-place, or I will get a failure to reconcile. Perhaps when I try this again next time, I will add some more security groups instead.

kingdonb commented 5 years ago

I'm going to take a stab at changing the one line in the source code and building my own CAPA which does internal ELB, just to see if it's feasible. I can't think of any reason that it won't work, but if it works, how likely is the SIG to accept a patch which makes it configurable as an option?

I think I would add apiServerElb: to the networkSpec, with schema: option that defaults to internet-facing, but can also take internal. It seems straightforward if the project is open to such.

sethp-nr commented 5 years ago

I'd be happy to review a PR that makes the API server LB type configurable. I wonder what we should do about the CAPA-created VPC case, though – an internal LB presupposes a lot about the network architecture between the CAPA bootstrap cluster and the cluster being created.

It might be as simple as documentation that says "internal ELB probably won't work for CAPA-created VPCs" but it would be nice if there was something we could offer.

detiber commented 5 years ago

Like @sethp-nr mentioned there are a lot of presuppositions that are needed for an internal LB to work, I'm leaning towards saying that we should likely start with only supporting this in the bring-your-own-VPC mode for now, since we wouldn't have enough information to know how to wire up a newly created VPC to the user's network.

dlipovetsky commented 5 years ago

I wonder what we should do about the CAPA-created VPC case, though – an internal LB presupposes a lot about the network architecture between the CAPA bootstrap cluster and the cluster being created.

Why does the network architecture matter? Is it to ensure that the workload cluster's API can be reached?

Here's a guess. If you can correct my mistakes, I'd appreciate it.

If the user wants to pivot, then clusterctl must be able to reach the target cluster's API to perform the pivot
If the user does not want to pivot, then the CAPA controllers running in the management cluster must be able to reach the target cluster's API to reconcile

ncdc commented 5 years ago

FYI, regardless of pivot, the CAPI pod requires network connectivity to the workload (target) cluster to determine/set a machine's node ref, and to be able to perform machine readiness checks in the machineset controller.

dlipovetsky commented 5 years ago

the CAPI pod requires network connectivity to the workload (target) cluster to determine/set a machine's node ref, and to be able to perform machine readiness checks in the machineset controller.

Thanks @ncdc!

If the requirement is indeed just for CAPI control plane to reach the workload cluster, I think a network-layer solution will work. (Off the top of my head: a reverse TCP tunnel combined with an HTTPS proxy).

Of course, the CAPA control plane will need to know where to reach the workload cluster's API. We use Cluster.Status.APIEndpoints for this today, but the type might need some more metadata, e.g., whether the API endpoint is reachable from the internet ("external") or not ("internal"). If a proxy is required and it is not transparent, additional metadata might be required (e.g. proxy address).

In my experience, the use case for an API not reachable from the internet is not uncommon, and it is not unique to AWS. Curious to hear other viewpoints.

detiber commented 5 years ago

@dlipovetsky fully agree that we should strive towards supporting this, I do worry about having to support an additional component to automatically try and provision and configure a tunneled proxy or trying to automate vpn configuration.

dlipovetsky commented 5 years ago

@detiber I agree. In fact, given the many different solutions to the problem, I worry that supporting one specific solution may be worse than supporting none! :slightly_smiling_face:

I think CAPI should be isolated from this problem as much as possible. And because the problem is not limited to AWS, I'd prefer a solution that works with many infra providers.

I'll run an experiment and share my thoughts.

dlipovetsky commented 5 years ago

Findings:

The problem is that management cluster's CAPI controllers need to reach the workload cluster API server in a network where ingress from the internet is not possible, but egress is.

A very common solution is to create a reverse TCP tunnel.

I have prototyped a client-server solution that allows the CAPI controllers to access the workload cluster's API via a Service on the management cluster. It uses a combination of reverse TCP tunnel and HTTP/S proxy.

It requires no changes to CAPI controllers. This is because the workload cluster kubeconfig is modified so that requests go through the HTTP/S proxy. This is not possible in client-go currently, but support for proxies is being added: https://github.com/kubernetes/client-go/issues/351.

However, there is a chicken-and-egg problem: The client is an add-on that runs on the cluster. And it must be deployed on the workload cluster before the CAPI controllers can access the workload cluster. But the CAPI controllers deploy addons after they are able to reach the workload cluster. Solving this problem requires that at least some addons be deployed during bootstrap,

dlipovetsky commented 5 years ago

@kingdonb Would what I describe above work for you? If not, can you please share your requirements? Thanks!

dlipovetsky commented 5 years ago

Here's a variation on the tunnel idea above: The client can be deployed natively on a non-cluster machine, instead of as a cluster addon.

I think the drawbacks include:

Increased operational complexity: A non-cluster machine can't be managed with CAPI, so you'll need different tooling to bring up a non-cluster machine in every infrastructure you use.
Increased cost: You're paying for another compute resource.

detiber commented 5 years ago

Yet another option longer term may be to leverage SSM Session Manager tunneling. Ideally it would support more than just ssh/scp, but there might still be a way to leverage it against an existing instance without requiring a bastion host.

dlipovetsky commented 5 years ago

@detiber News to me, thanks! If SSM tunneling supports standard ssh clients, then static or dynamic local port forwarding can be used.

Though I would prefer a solution that can work with many infrastructures.

kingdonb commented 5 years ago

Perhaps it will help if I explain my specific network architecture in a little bit more ~terseness~ detail (ok I don't do terse well sorry.)

My VPC has public and private subnets, which are routed to various places on campus. The zones are Public, Campus, DC - both Campus and DC are internal zones, while Public is perhaps obviously an internet-facing zone.

I provision my bastion host in the DC zone, and permit SSH traffic from anywhere (since these are private zones, and DC is the most closed-off zone, the result is traffic is really only permitted from anywhere within the 172.17.x.x ranges that are in the DC or connected via VPN.)

Our VPN client connects from anywhere through Cisco ASA, to permit my traffic to reach Campus as well as DC subnets with some restrictions based on ASA rules and security groups. SSH and any other traffic is 100% blanket Deny from Campus or Public subnets, as I understand it, and although there are some load balancers which expose given services to campus or public, there is no trunk route into DC from any subnet in any other zone.

Some traffic patterns from host to host inside of the DC are permitted by explicit security group rules inside of the DC, and campus traffic is generally less restricted than DC, VPN clients like mine are privileged, can reach anywhere in the DC or campus unless it is restricted by a rule on the ASA firewalls, or by any restrictive SGs that are attached.

Hosts which are on the Public internet, or are physically on Campus (but not in a DC on campus) should not be able to reach the Control Plane's LB without that VPN client, just as they are not able to reach the bastion SSH. Only DC services that are permitted and VPN users can access Kube API, per our network's design and specifications which I must adhere to. This is done since our VPN connection requires a 2FA, this is a sweeping security measure that we have implemented to ensure that all IT management traffic is always protected by a 2FA.

Coupled with this, the use of AWS_ACCESS_KEY and SECRET_KEY are extremely restricted. You do not get a key and secret, role boundary prevents you from generating one, even one that is protected by MFA. If you need such access for development then you may 2FA into the AWS console (I can do this on the Lab network, only the pros may do this inside of DC account after architecture review). On the lab network, I may provision an IAM role with the access needed given the confines of the permission boundary, yadda yadda, you cannot generate a key or talk to AWS API on these accounts from your workstation or the public internet. But you can assume the privileged role from a host which attaches for example the CAPA bootstrapper role, (which has also been modified to attach a named permission boundary that is required per our account policy, that prohibits me from creating another VPC and limits what instance types I can create.)

I used my bastion host to assume the bootstrap role so that I could perform the clusterctl create. No internet traffic is involved in the bootstrapping of this cluster, other than the egress you described which is fine, this is not an air-gapped net. The whole clusterctl create experience was pretty much flawless IMO, except that my cluster on wholly private networking is exposed to the internet with a public ELB whose security groups I cannot safely modify, and that is strongly undesirable. Let me know if you think there's something I'm missing, I think that most AWS accounts likely won't have this Cisco VPN configuration, but that's in a nutshell how our account is configured.

If we adhere to this set of rules, then 2FA can be skipped mostly in favor of IAM roles, or service tokens where absolutely needed, for any services that are not exposed outside of the DC. This way, basic and more conventional authentication methods like access keys and tokens can be relied on more safely inside the confines of the DC, since they are never usable from outside. Basic token auth kubeconfig is therefore also OK for now as our cluster admins are a limited group, and all are using VPN connections for their day-to-day already anyway.

Many Kubernetes auth methods are more advanced, could serve us better in a future state where more K8s and Cluster API users are onboarded into namespaces with their applications via RBAC, and more complex multi-tenancy is required. We can integrate with Kerberos to be secure like as-in for on the public internet, and that will most likely come soon, but not in the starting design as all of our K8s API consumers will be co-equal cluster admins, who are regulars on that VPN... others will interact with the cluster via Pull Request and GitOps workflow. (This is how I imagine that it will work, anyway. We need to resolve this one way or another in order to proceed with our adoption efforts.) So we will give the cluster admins access to K8s API through bastion hosts and/or kubeconfig+VPN, and others with less access to cluster internals will use more restrictive dashboard/portals, an API that we expose later, or the GitOps-enabled cluster git repo to interact with the cluster.

The kubeconfig handled by cluster admin should be rendered useless without a VPN connection by the network design, just like our internal-only API tokens and keys are rendered worthless if they are compromised. Opening any form of proxy seems to be incompatible with this design.

I think my VPN setup is very unlikely to be the most common design, but probably not super uncommon either. If I understand what you've solved with the proxy, it won't be a problem anyway for my CAPA bootstrapper, which is running kind directly on the bastion host with the bootstrap role attached.

I will give the Control Plane LB an IP in one of the super-private DC zones, if that configuration is possible, and mark it as internal with a security group that permits traffic from DC/VPN users only, who are our cluster admins, or other hosts within the DC by security groups as needed.

I don't think that we would use a proxy as you've described, I think the network architects would read that as a concession against their design rather than a convenience, since if I'm understanding what you propose, it represents a way around this 2FA forcing scheme.

So the control plane is only accessible through the load balancer, and otherwise from private traffic on the overlay network (from inside of the cluster.) And generally for any sort of traffic on the rest of the DC network to reach the cluster, it would always have to go through the control plane LB or one of the other LoadBalancer or Ingress made on the cluster. (There is no external permitted traffic into NodePort or HostPort services in the design at this time, to keep the model as simple and similarly constrained as possible.) We will certainly provision some public LoadBalancer services from inside of the cluster later on, but the cluster API would likely never be exposed outside of this DC, as a management service out of an overabundance of caution.

dlipovetsky commented 5 years ago

Thanks a lot for the details @kingdonb! Let me know if I understood correctly:

You're running the bootstrap cluster inside an existing VPC
You want CAPA to deploy a cluster into that VPC
You want CAPA to create an internal LB

dlipovetsky commented 5 years ago

@sethp-nr

It might be as simple as documentation that says "internal ELB probably won't work for CAPA-created VPCs" but it would be nice if there was something we could offer.

I think you're on the right track. Let's make the control plane-to-workload cluster API connectivity requirement abundantly clear in the CAPA documentation, and call it out again in whatever type we add to CAPA's networkSpec.

Whether CAPA creates the VPC or not, CAPA will require the end-user to provide connectivity. The end user can create a tunnel, add the newly created VPC to a VPN, use a proxy (once kubeconfigs support proxies), or something else.

Until it can reach the workload cluster API, the control plane will keep trying to reconcile (e.g., the NodeRef controller will keep trying to reach the workload cluster).

kingdonb commented 5 years ago

@dlipovetsky You said it in 3 bullet points, that's exactly right. I appreciate your taking the time to understand my use case. I think your intentions as stated here are likely to meet my requirement.

dlipovetsky commented 5 years ago

Discussed at the 08/12/19 meeting. There are two ways to go here: (a) Allow internal ELB when CAPA deploys to an existing VPC. (b) Allow internal ELB when CAPA creates the VPC.

In either case, the user must ensure that the CAPA controllers can reach the workload cluster. If the controllers cannot, various functionality will break, among them the pivot performed by clusterctl, the association of Nodes to Machines by the NodeRef controller.

Adding this option increases the chances that a user might break cluster deployment. On the other hand, requiring the user bring their own network infrastructure is a large burden just to support a different ELB type. One positive is that, even if there is no connectivity and the cluster cannot be deployed, CAPA will be able to clean up all existing AWS resources.

The rough consensus at the meeting seemed to be: CAPA can support (b), as long as these requirements are met:

Document the connectivity requirement clearly, and
Log a warning in the cluster controller when the ELB type is "internal" and the ELB is not reachable.

dlipovetsky commented 5 years ago

I don't think anyone objected to having to change the CAPA API to support an ELB type. But just in case anyone is wondering:

Supporting (b) requires changing the CAPA API.

Today, CAPA creates the ELB whether or not the user specifies an existing VPC. However, if CAPA can delegate ELB creation to the user in the existing VPC case, then it should be possible to support (a) without changing the CAPA API. CAPA would still be able to issue the connectivity warning as described above by querying the ELB API for the ELB type.

vincepri commented 5 years ago

Adding a new filed shouldn't be an issue, especially for v1alpha2.

arzarif commented 5 years ago

@dlipovetsky - I've been following this ticket for the past couple of months. Appreciate the updates/summaries you've been providing!

I'm looking to leverage Cluster API with an existing VPC, however, our VPCs are typically configured without public subnets. Is the ability to accommodate case (a) (ie. user brings their own VPC and is responsible for ensuring connectivity) something that could be supported in the near future?

kingdonb commented 5 years ago

I'm still interested in this feature, and I wonder if anyone has started work on it.

I understand there is a AWS Cluster API Meeting every two weeks and am considering dropping in. I won't make it this week. But I saw the help wanted label and I'm interested to get started on this feature, if someone does not mind providing reviews for a fairly green go coder.

detiber commented 5 years ago

@kingdonb would love to discuss this further at a future meeting.

I'm also happy to review any PRs that come in to provide this feature, but we would likely also require documentation for the feature in addition to the implementation.

kubernetes-sigs / cluster-api-provider-aws

Possible to provision API with an internal LB ? #873