okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.72k stars 295 forks source link

IPv6 with OKD4 UPI on OpenStack fails #817

Closed pjoomen closed 2 years ago

pjoomen commented 3 years ago

Describe the bug

With a fresh install of OKD4 on OpenStack/UPI, the IPv6 configuration fails since NetworkManager defaults to stable-privacy. OpenStack expects the VM to use eui64 and does not allow any other IPv6 addresses (i.e. those are blocked by firewall rules on the hypervisor). Note that this is a feature of port-security, but it is best practice have port-security enabled.

Version

Cluster version is 4.7.0-0.okd-2021-08-07-063045

How reproducible

100% reproducible

References

Related bug-reports https://github.com/coreos/fedora-coreos-tracker/issues/907 https://bugzilla.redhat.com/show_bug.cgi?id=1743161

pjoomen commented 3 years ago

A workaround for installations with networkType:OpenshiftSDN exists by using the following butane configuration:

variant: openshift
version: 4.8.0
metadata:
  name: 98-nmconnection
  labels:
    machineconfiguration.openshift.io/role: master
storage:
  files:
    - path: /etc/NetworkManager/system-connections/default.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=Wired Connection
          type=ethernet
          autoconnect-retries=1
          multi-connect=3
          permissions=
          [ethernet]
          mac-address-blacklist=
          [ipv4]
          dhcp-timeout=90
          dns-search=
          method=auto
          [ipv6]
          addr-gen-mode=eui64
          dhcp-timeout=90
          dns-search=
          method=auto
          [proxy]

For networkType:OVNKubernetes this needs to be fixed by adjusting https://github.com/vrutkovs/machine-config-operator/blob/ea229fbb1eb3b68d0014a0b06de27addc1dc473e/templates/common/_base/files/configure-ovs-network.yaml#L120

vrutkovs commented 3 years ago

So in configure-ovs-network.yaml we need


      dhcp6_client_id=$(nmcli --get-values ipv6.dhcp-duid conn show ${old_conn})
      if [ -n "$dhcp6_client_id" ]; then
        extra_brex_args+="ipv6.dhcp-duid ${dhcp6_client_id} "
        extra_brex_args+="ipv6.ipv6.addr-gen-mode 0 "
      fi

and in IPv6 mode we'll have br-ex set to use eui64? Do we know if that would affect IPv6 on other platforms?

vglaiju commented 3 years ago

Hi thanks for opening this Bug. We are attempting same in vmware with ovnkube-node container stuck in CrashloopBackOff environment : vmware 7 okd : okd4.7 network : ovn-kubernetes We observe the multus pods are throwing sandbox error and stays in crashloopBackOff state.

We read "IPv6 is supported only on bare metal clusters" https://docs.okd.io/latest/networking/ovn_kubernetes_network_provider/about-ovn-kubernetes.html. Does that mean if anybody tries in vmware or openstack hypervisors , IPV6 dual stack will not work irrespective of OKD4.7/OKD4.8?

pjoomen commented 3 years ago

So in configure-ovs-network.yaml we need


      dhcp6_client_id=$(nmcli --get-values ipv6.dhcp-duid conn show ${old_conn})
      if [ -n "$dhcp6_client_id" ]; then
        extra_brex_args+="ipv6.dhcp-duid ${dhcp6_client_id} "
        extra_brex_args+="ipv6.ipv6.addr-gen-mode 0 "
      fi

and in IPv6 mode we'll have br-ex set to use eui64?

dhcp6_client_id is not set (no DHCPv6 is in use, just SLAAC), so we would need something like:

      # create bridge; use NM's ethernet device default route metric (100)
      if ! nmcli connection show br-ex &> /dev/null; then
        nmcli c add type ovs-bridge \
            con-name br-ex \
            conn.interface br-ex \
            802-3-ethernet.mtu ${iface_mtu} \
            802-3-ethernet.cloned-mac-address ${iface_mac} \
            ipv4.route-metric 100 \
            ipv6.route-metric 100 \
            ipv6.addr-gen-mode eui64 \
            ${extra_brex_args}
      fi

Do we know if that would affect IPv6 on other platforms?

No, but previously Fedora CoreOS was defaulting to addr-gen-mode=eui64, i.e. my previous installation (from last year) was doing the correct thing from the start, albeit with networkType:OpenshiftSDN.

pjoomen commented 3 years ago

Does that mean if anybody tries in vmware or openstack hypervisors , IPV6 dual stack will not work irrespective of OKD4.7/OKD4.8?

Note that I opened this bug since it prevents us from accessing the OpenStack VMs over IPv6,and it is therefore not (directly) related to dual-stack Kubernetes. This feature allows us to connect directly to the OpenStack VMs (since we do route the IPv6 traffic) without requiring the configuration of floating IPs for all VMs.

When enabling dual-stack Kubernetes I would suspect that it requires disabling of port-security (or at least adjusting the allowed-address-pairs in OpenStack, since not doing this would prevent any communication from unknown addresses.

vglaiju commented 3 years ago

Thanks @pjoomen , Shall i open a separate bug for Dual Stack support in okd4.7, 4.8 on vmware ?

pjoomen commented 3 years ago

Do we know if that would affect IPv6 on other platforms?

To re-iterate:

The fact that Fedora CoreOS defaults to ipv6.addr-gen-mode=stable-privacy is a sane choice for workstations, but does not make any sense for server-type installations. ipv6.addr-gen-mode=eui64 gives us a predictable IPv6 address since it is constructed from the combination of the route-announcement and the mac-address. This address is (for OpenStack) the one that is configured as the address to expect traffic from. Traffic from other IPv6 addresses, originating from the VM, is disallowed (if no further configuration is taken).

The proper way to deal with this issue, is to have the default of the base operating system changed. This would also allow us to have access to the VMs before OKD corrects the error.

The two ways to deal with this issue as mentioned in this report, are viable workarounds (until a more permanent fix is in place).

pjoomen commented 3 years ago

Thanks @pjoomen , Shall i open a separate bug for Dual Stack support in okd4.7, 4.8 on vmware ?

Yes, that makes sense to me.

dustymabe commented 3 years ago

Note the Fedora CoreOS stance on this issue: https://github.com/coreos/fedora-coreos-tracker/issues/907#issuecomment-901329628 We want to fix it in the base but it might be some time.

For now Thomas and I suggested in the BZ to have whatever is calling nmcli to also specify the ipv6.addr-gen-mode as well.


No, but previously Fedora CoreOS was defaulting to addr-gen-mode=eui64, i.e. my previous installation (from last year) was doing the correct thing from the start, albeit with networkType:OpenshiftSDN.

Yes. The change in behavior was unintended. See https://github.com/coreos/fedora-coreos-tracker/issues/513#issuecomment-887032823

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

MindTooth commented 2 years ago

It does not seem to be that FCOS 35 has fixed this. Am I correct in making that assumption?

Is there a place that I can actively monitor this to see if and when it gets pushed?

jonasbartho commented 2 years ago

The following butane file can be used as a workaround when provisioning the cluster(using default OVNKubernetes) on openstack: (works with OKD 4.7/4.8/4.9)

variant: openshift
version: 4.9.0
metadata:
  name: 97-eui64
  labels:
    machineconfiguration.openshift.io/role: master
systemd:
  units:
    - name: configure-eui64.service
      enabled: true
      contents: |
        [Unit]
        Before=multi-user.target
        Wants=network-online.target
        After=network-online.target

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/configure-eui64.sh
        ExecStartPost=/usr/bin/touch /tmp/has_executed
        RemainAfterExit=no

        [Install]
        WantedBy=multi-user.target
storage:
  files:
    - path: /usr/local/bin/configure-eui64.sh
      mode: 0700
      contents:
        inline: |
          #!/bin/bash
          nmint=(ovs-if-br-ex br-ex)

          for i in "${nmint[@]}";do
            if [ "$(nmcli con show "${i}"|awk '/ipv6.addr-gen-mode/ {print $2}')" == "stable-privacy" ]; then
              # /etc/NetworkManager/systemConnectionsMerged = OVERLAY
              echo "eui64 is not enabled for connection ${i}! Enabling now!"
              nmcli con mod "${i}" ipv6.addr-gen-mode eui64 && nmcli con up "${i}"

              # /etc/NetworkManager/system-connections = UNDERLAY
              # This needs to be changed to make the change persistent after reboot
              sed -i 's/stable-privacy/eui64/' /etc/NetworkManager/system-connections/"${i}".nmconnection
            fi
          done 
openshift-bot commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 2 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 2 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/okd/issues/817#issuecomment-1029473428): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.