openshift-metal3 / dev-scripts

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/
Apache License 2.0
94 stars 186 forks source link

OCPBUGS-16912: Agent use nmstateconfig for DHCP tests #1557

Closed bfournie closed 1 year ago

bfournie commented 1 year ago

The Agent IPV6_DCHCP test fails due to the address assigned via DHCPv6 not matching the rendezvousIP. The reason is that by default NetworkManager sends DHCPv6 Solicitations with a Client ID of DUID-UID (prefix=00:04, see https://datatracker.ietf.org/doc/html/rfc6355) containing a machine specific value. When using nmstateconfig for DHCP, the NetworkManager keyfiles are setup for each interface with dhcp-duid=ll (prefix=00:03, see https://datatracker.ietf.org/doc/html/rfc3315#section-9.4) which means that the DHCPv6 Solicitation is based on the mac address. This allows the DHCPv6 server to assign a deterministic address which can be set to match the rendezvousIP, e.g https://github.com/metal3-io/metal3-dev-env/blob/main/vm-setup/roles/libvirt/templates/network.xml.j2#L93

Using the nmstateconfig is also recommended for IPv4 and is consistent with configurations we have seen from customers. For both cases the nmstate configs are used - https://nmstate.io/examples.html#dynamic-ip-configuration.

https://github.com/openshift-metal3/dev-scripts/pull/1555 is also needed and included here.

bfournie commented 1 year ago

/cc @zaneb @rwsu @danielerez

openshift-ci[bot] commented 1 year ago

@bfournie: GitHub didn't allow me to request PR reviews from the following users: danielerez.

Note that only openshift-metal3 members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/openshift-metal3/dev-scripts/pull/1557#issuecomment-1657283198): >/cc @zaneb @rwsu @danielerez Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bfournie commented 1 year ago

/test e2e-agent-sno-ipv6

rwsu commented 1 year ago

Hi @bfournie. Can you help me understand how this works? The ostestbm network has the DHCP host-id to host-ip mappings. For example:

  <ip family='ipv6' address='fd2e:6f44:5dd8:c956::1' prefix='120'>
    <dhcp>
      <range start='fd2e:6f44:5dd8:c956::14' end='fd2e:6f44:5dd8:c956::3c'/>
      <host id='00:03:00:01:00:04:da:4e:c7:c1' name='master-0' ip='fd2e:6f44:5dd8:c956::14'>
        <lease expiry='60' unit='minutes'/>
     </host>

The MAC address is embedded in the host-id. Is the prefix "00:03:00:01:" ignored when matching an interface's MAC address to the nmstate configs because dhcp-duid=ll is used?

zaneb commented 1 year ago

https://datatracker.ietf.org/doc/html/rfc8415#section-11.4

bfournie commented 1 year ago

Hi @bfournie. Can you help me understand how this works? The ostestbm network has the DHCP host-id to host-ip mappings. For example:

  <ip family='ipv6' address='fd2e:6f44:5dd8:c956::1' prefix='120'>
    <dhcp>
      <range start='fd2e:6f44:5dd8:c956::14' end='fd2e:6f44:5dd8:c956::3c'/>
      <host id='00:03:00:01:00:04:da:4e:c7:c1' name='master-0' ip='fd2e:6f44:5dd8:c956::14'>
        <lease expiry='60' unit='minutes'/>
     </host>

The MAC address is embedded in the host-id. Is the prefix "00:03:00:01:" ignored when matching an interface's MAC address to the nmstate configs because dhcp-duid=ll is used?

When duid-ll is set the host will send a Client ID containing the "00:03:00:01" prefix and the mac suffix that will match what was added by dev-scripts here https://github.com/metal3-io/metal3-dev-env/blob/main/vm-setup/roles/libvirt/templates/network.xml.j2#L93. For example the host will send: dnsmasq-dhcp[1318507]: DHCPSOLICIT(ostestbm) 00:03:00:01:00:f3:f4:3e:09:88

When duid-ll is not set, by default NetworkManager will send a Client ID with a DUID-UID, which is generated on the host and can't be determined, so there's no way to match. For example the host still send: dnsmasq-dhcp[1318507]: DHCPSOLICIT(ostestbm) 00:04:90:2a:c0:06:7b:79:85:30:88:7c:e7:81:27:2a:d4:09

rwsu commented 1 year ago

/lgtm Thanks for the explanation. I had a trouble understanding what the prefix means. I understand now that 00:03 means to use Link Layer DUID and 00:01 after that means ethernet hardware type, then followed by MAC address or link-layer address.

zaneb commented 1 year ago

Is this change still necessary now that we have openshift/installer#7384? I worry that making this change could potentially cover up regressions in future.

bfournie commented 1 year ago

Is this change still necessary now that we have openshift/installer#7384? I worry that making this change could potentially cover up regressions in future.

I agree its no longer required for DHCP IPv6 after the installer change, but I think its useful to have the nmstate config for v4 and v6 since it seems to be common customer configuration.

How about if I add a flag to enable NMSTATE_CONFIG_FOR_DHCP which would be disabled by default so we'd keep the current behavior but could easily test it out?

zaneb commented 1 year ago

/approve

openshift-ci[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift-metal3/dev-scripts/blob/master/OWNERS)~~ [zaneb] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
rwsu commented 1 year ago

/retest-required

openshift-ci[bot] commented 1 year ago

@bfournie: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-sno-ipv6 49ec6e1990e5f0210ff0e604c1e392413c1a67ab link false /test e2e-agent-sno-ipv6
ci/prow/e2e-agent-ha-dualstack 49ec6e1990e5f0210ff0e604c1e392413c1a67ab link false /test e2e-agent-ha-dualstack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
bfournie commented 1 year ago

/retest-required