Open anurag opened 3 years ago
From an Equinix Metal product offering perspective, this is possible in one of two ways.
Every Equinix Metal device has a management address assigned to the first bond, by default. This per device /30
(by default) assignment is part of a 10.x.x.x/25
address range isolated to device peers within the same EM project and facility (a /56
is also created as an IPv6 management range). This https://github.com/equinix/terraform-metal-anthos-on-baremetal project used this solution, keeping the node communication on the 10.x.x.x network.
Alternatively, a device may be provisioned with VLANs. Until recently, all EM Devices used to have to toggle Layer2 support on a port or bond to enroll in this behavior. This is represented by the layer2-bonded, layer2-individual, hybrid network modes available in the Terraform provider with similar names in the EM console UI.
It is now possible to add VLANs to a device without pre-enrolling the device in Layer2 modes, but this new functionality is limited to devices in a subset of facilities (those with the IBX feature flag). We'll have to actively enable layer2 features or detect when this step can be skipped.
It is worth noting that the network modes reported in the UI and represented in the Terraform provider are not settings that can be toggled in the EM API. Network mode is a product of port bondedness, layer-2 or layer-3 state, the presence of management addresses, and the presence of VLANs.
After a few different iterations on how to represent these features in Terraform, we are coming around to the idea of giving the user control over the ports directly without offering network mode hand-waving.
Expressed statefully, and adhering closely to the EM API spec, this could look like this:
- plan: "n2.x-large"
ports: # network_ports in the API naming
bond0: # always comprised of ports eth0+eth1 on 2 port devices, or eth0+eth2 on 4 port devices
bonded: true # default
layer2: false # this is default on new devices. changes result in /ports/id/convert/layer-[2|3] API calls
vlans: [1234, 5678, 1234] # shortened this from the API "virtual_networks"
native_vlan: 5678 # optional, must be one of the above if set
ip_addresses:
- reservation_id: uuid # reserved addresses by uuid, may include global ips
- cidr_notation: "1.2.3.4/24" # reserved addresses by address, may include global ips
- type: private_ipv4 # dynamically assigned addresses, available via metadata.platformequinix.net
cidr: 30
bond1: # comprised of port eth1+eth3 on 4 port devices
bonded: false
eth1: # unbonded eth ports can use most of the same attributes bond ports can use
layer2: true
vlans: [7654]
eth3:
layer2: true
vlans: [9876]
Users are free to customize the ports, bonds, addresses, and VLANs with all the flexibility afforded by the API. Invalid configurations will bubble up through API errors and input does not have to be validated by the CAPP provider. It is difficult to validate these values because the plan, facility, state of the port, state of the bonds, and other factors affect the success of toggling a field. Through Kubernetes reconciliation loops, eventually, all bonds will be connected, disconnected, or the device will reach the desired layer-2/layer-3 mode. The desired addresses will be provisioned and assigned.
A port and bond configuration not supported today, maybe supported later or may be supported in a different plan or facility setting.
This was discussed for consideration in the Terraform provider within a packngo PR: https://github.com/packethost/packngo/pull/239#discussion_r572120683
I don't know how well CAPP can capture this approach. CAPP would need to know which address ranges to use for what purposes, dynamically allocated ranges like the project management range could be referred to generically without knowing the precise range. In other cases, the precise range may be known in advance (IP reservations). In VLAN cases, the nodes would need special userdata to configure the network and CAPP would need to be told what ranges to manage.
Userdata scripts would need to be hearty, allowing time for the ports to be bonded, VLANs to be attached, and addresses to become available.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
/lifecycle frozen
I think this shouldn't be limited to just the control plane, but rather all nodes should only have private IPs and any service that needs to be exposed should be done through a load balancer. This is similar to how EKS, GKE and AKS do things.
CPEM added support for clusters with node IP addresses drawn from a VLAN accessible range in v3.5.0.
CAPP should take advantage of this for clusters that take advantage of Equinix Metal Gateway and VRF features.
Hi,
I implemented change in CAPP for the servers to not use public IPs (I can open PR for that). But the blocker on this feature is that servers without public IP address don't have Internet connectivity and there is no NAT Gateway service or similar service that would enable servers to reach Internet.
This can be feasible for use-cases where Internet connectivity is not needed and all dependencies are served from internal network, but overall for most of the standard use-cases the Internet needs to be reachable (at least for kubelet to download container images).
Correct me if I'm wrong but I don't see how Metal Gateway enables servers to not have public IPs - as we can see in the Metal Gateway example the servers need to be configured with IP address on OS level (which is even less automatable).
One possible way to overcome this is to have a server that serves as NAT Gateway using standard Linux with enabled ip_forward
and iptables
, but that's configuration overhead with single point of failure and increases costs by 1 server.
Is there a plan to have NAT service or something similar to make this easier? Or can you provide future plans around this feature?
If I wrote anything wrong, please correct me and provide me more information.
@Lirt This comes down to enabling more networking configuration in ClusterAPI cluster / machine configuration.
For example, nodes could be configured to start-up in a Layer2 Equinix Metal mode (no managed IPs assigned, no Equinix Metal DHCP) and then use cloud-config to assign static IPs.
When these addresses are public, a Metal Gateway can route these nodes to the internet. At first glance, this doesn't seem all that different than using the Equinix Metal provided IP addresses. The benefit is that the VLAN may be one connected to other networks in the Equinix Metal project or even in other clouds (by using Fabric connections). The VLAN may also be one connected to a Network Edge router or a Fabric link to Equinix Internet Access.
For another scenario, let's take advantage of the off-the-shelf public IP connectivity while we configure ClusterAPI nodes to ALSO connect to a VLAN where additional services (alternate K8s environments, or non-K8s services) are running within the Equinix Metal project. We may also want to use this VLAN to connect to Fabric for any of the reasons previously stated.
Moving away from the Public IP clusters, we may want to use CAPP to create a cluster in an existing Equinix Metal Project with an existing VLAN with an existing NAT configuration, possibly with DHCP present. The NAT could be an Equinix Metal service or device (as you pointed out) or it could be Network Edge or it could be a service running somewhere on the VLAN connected through Fabric.
In these scenarios, the common need is for CAPP to:
What we can expect to be injected are:
CAPP needs to be configurable enough to take advantage of existing resources.
Whether or not CAPP should be responsible for creating and managing any of those API-manageable network primitives is another matter. 😄
@Lirt Your branch / PR sounds very interesting. The simplest way to introduce Layer2 capabilities into CAPP would be to take advantage of Public IP addresses routed through Metal Gateway. This would exercise most of the paths that we would need before exchanging public IPs for private ones and changing the external environment to one where a NAT or private network is utilized along with a private container registry.
Open a draft PR?
Hi,
Thank you for explanation. I am still learning about Equinix services and network setup but now it makes more sense (Metal Gateway use-case).
My PR is just leveraging the packngo
library (Equinix API) to let user specify IP addresses of a device. You can see it opened here.
But I understand that you want to implement full control over ports and ip addresses as you described in your comment (https://github.com/kubernetes-sigs/cluster-api-provider-packet/issues/226#issuecomment-777021331).
For me one thing is missing for proper device configuration and it is the fact, that the ports
API is split from devices and there is no configurable link between them.
Why I said that the link is missing is an issue is scenario like this:
cloud-init
, which executes some of the steps/modules only once during first boot.@Lirt these are valid points.
API clients do not know dynamically assigned IP addresses until after the device has been provisioned, just as it begins to boot. Elastic IPs can be assigned at device creation to get ahead of this.
The MAC address and disk configuration are also unknown until the device instance has been created and begins to provision. These factors can be known ahead of "Instance" creation when using a Hardware Reservation, but On-Demand and Spot instances do not have this advantage.
One way to work around this, for some scenarios, is to manipulate the userdata before it is consumed. This would be a race, but there is a good amount of time between when this information is API discoverable and the device begins to fetch userdata.
Userdata can be manipulated at any time via the EM API and the userdata will appear in metadata.platformequinix.com immediately (confirmed this). An alternative to replacing static userdata is to use dynamic userdata. By using shell (python, etc) scripts in userdata, we can grab these details from the OS and the metadata service. Another alternative is cloud-config Jinja templating which substitutes template symbols in the cloud-config script with metadata valyes.
It's important to point out that in full Layer2 modes, the metadata service is not reachable. For these environments, we need a staged approach:
This can also be attempted without shutting down by racing the API changes and OS changes (the OS may need to detect the network changes before activating the new settings).
As an operator I'd like to create clusters with nodes that are completely isolated from the public internet. Instead, they should only be accessible through authorized IPs or bastion nodes.
Detailed Description
In CAPP's current implementation, all control plane and worker nodes have public IP addresses, kubelets are configured to speak the public IP address for the control plane, and anyone on the internet can attempt connections to control plane nodes. While TLS+authentication add a layer of security, it would be nice to have the ability to create cluster nodes without public IP addresses, analogous to GKE private clusters. All egress traffic would be directed to NAT gateways and inbound traffic will only be allowed from authorized IPs..
/kind feature