lsst-uk / somerville-operations

User issue reporting and tracking for the Somerville Cloud
0 stars 0 forks source link

ubuntu-jammy image, cloud-init not configuring interfaces correctly #162

Open GregBlow opened 7 months ago

GregBlow commented 7 months ago

G Francis 11:21 AM Hi, when creating an instance with multiple network interfaces is there an extra step that we need to take in order to get cloud-init to configure them both?

Greg Blow 11:21 AM which networks?

G Francis 11:22 AM I thought that this used to work automatically, but I've created a bunch of instances and only the first one gets configured. 11:22 Here's an example of running cloud-init: ubuntu@lasair-lsst-svc:~$ sudo cloud-init init Cloud-init v. 23.1.2-0ubuntu0~22.04.1 running 'init' at Mon, 15 Apr 2024 10:12:00 +0000. Up 160.64 seconds. ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++ ci-info: +--------+-------+------------------------------+---------------+--------+-------------------+ ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | ci-info: +--------+-------+------------------------------+---------------+--------+-------------------+ ci-info: | enp3s0 | True | 10.66.4.64 | 255.255.255.0 | global | fa:16:3e:18:50:99 | ci-info: | enp3s0 | True | fe80::f816:3eff:fe18:5099/64 | . | link | fa:16:3e:18:50:99 | ci-info: | enp7s0 | False | . | . | . | fa:16:3e:7c:a7:30 | ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | ci-info: | lo | True | ::1/128 | . | host | . | ci-info: +--------+-------+------------------------------+---------------+--------+-------------------+ ci-info: +++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++ ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+ ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+ ci-info: | 0 | 0.0.0.0 | 10.66.4.1 | 0.0.0.0 | enp3s0 | UG | ci-info: | 1 | 10.66.4.0 | 0.0.0.0 | 255.255.255.0 | enp3s0 | U | ci-info: | 2 | 10.66.4.1 | 0.0.0.0 | 255.255.255.255 | enp3s0 | UH | ci-info: | 3 | 10.66.4.2 | 0.0.0.0 | 255.255.255.255 | enp3s0 | UH | ci-info: | 4 | 129.215.146.5 | 10.66.4.1 | 255.255.255.255 | enp3s0 | UGH | ci-info: | 5 | 129.215.166.13 | 10.66.4.1 | 255.255.255.255 | enp3s0 | UGH | ci-info: | 6 | 129.215.205.191 | 10.66.4.1 | 255.255.255.255 | enp3s0 | UGH | ci-info: | 7 | 169.254.169.254 | 10.66.4.2 | 255.255.255.255 | enp3s0 | UGH | ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+ ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | Route | Destination | Gateway | Interface | Flags | ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | 1 | fe80::/64 | :: | enp3s0 | U | ci-info: | 3 | local | :: | enp3s0 | U | ci-info: | 4 | multicast | :: | enp3s0 | U | ci-info: +-------+-------------+---------+-----------+-------+ 11:24 There's a first interface, en3s0, on the lassir-lsst network that gets configured OK. But cloud init does not recognise that the second interface, enp7s0 should also be up and configured (it's on the cephfs network)

Greg Blow 11:24 AM hmm, lasair-lsst and cephfs. 11:24 I think it must have worked automatically. Could be an image issue?

G Francis 11:24 AM The image is the default ubuntu-jammy so maybe if you've changed it, but we haven't made any changes. 11:29 I've just noticed that the bootstrap instance, which was created in Horizon, did get both interfaces configured, whereas the others, which were created using terraform, did not so I guess there is some sensitivity to the exact order of operations when creating an instance.

G Francis 11:29 AM Or maybe some missing step that Horizon knows to do, but Terraform does not.

1 reply Today at 11:32 AMView thread

Greg Blow 11:30 AM hmm

Greg Blow 11:38 AM I'm having a hard time getting a test instance connected through the horizon interface. Not sure if it's something obvious though.

G Francis 11:45 AM I've just created two test instances using Horizon. 1 included both network interfaces from the start and cloud-init picks them both up and configures them both. 2 create with only lasair-lsst then attach cephfs after - we get the same behaviour as with terraform, i.e. only the first interface gets configured.

Greg Blow 11:45 AM I am also seeing irregular behaviour.

G Francis 11:46 AM So the question is, after attaching a 2nd network interface, what do I do to get it configured? 11:47 I'm fairly sure that this has worked at some point in the past because I wrote the terraform scripts that way and they worked last time we used them. 11:47 But it's possible I'm forgetting some extra step that we did and somehow forgot to document.

Greg Blow 11:50 AM possibly, but it does look to be working dysfunctionally 11:51 As you documented, creating an instance with both interfaces configures both correctly. Removing one instance and then readding it causes it to come up unconfigured. 11:54 I'm going to try and get some time soon to look through the cloud-init logs, see what it's doing. Unfortunately can't right at this moment.

GregBlow commented 7 months ago

@gpfrancis fyi

GregBlow commented 7 months ago
2024-04-15 10:50:06,217 - __init__.py[DEBUG]: Detected interfaces {'lo': {'downable': False, 'device_id': None, 'driver': None, 'mac': '00:00:00:00:00:00', 'name': 'lo', 'up': True}, 'enp4s0': {'downable': True, 'device_id': '0x0001', 'driver': 'virtio_net', 'mac': 'fa:16:3e:0b:90:62', 'name': 'enp4s0', 'up': False}, 'enp3s0': {'downable': True, 'device_id':
 '0x0001', 'driver': 'virtio_net', 'mac': 'fa:16:3e:37:b1:2b', 'name': 'enp3s0', 'up': False}}
2024-04-15 10:50:06,217 - __init__.py[DEBUG]: unable to do any work for renaming of [['fa:16:3e:37:b1:2b', 'enp3s0', 'virtio_net', '0x0001'], ['fa:16:3e:30:ba:3c', 'enp4s0', 'virtio_net', '0x0001']]
2024-04-15 10:50:06,217 - stages.py[WARNING]: Failed to rename devices: [nic not present] Cannot rename mac=fa:16:3e:30:ba:3c to enp4s0, not available.
GregBlow commented 7 months ago
ubuntu@multinet-test4:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8942 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:37:b1:2b brd ff:ff:ff:ff:ff:ff
    inet 10.71.0.196/24 metric 100 brd 10.71.0.255 scope global dynamic enp3s0
       valid_lft 43136sec preferred_lft 43136sec
    inet6 fe80::f816:3eff:fe37:b12b/64 scope link
       valid_lft forever preferred_lft forever
3: enp4s0: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
    link/ether fa:16:3e:0b:90:62 brd ff:ff:ff:ff:ff:ff
GregBlow commented 7 months ago

/run/cloud-init/instance-data.json

"network_json": {
   "links": [
    {
     "ethernet_mac_address": "fa:16:3e:37:b1:2b",
     "id": "tapc4d1b119-9c",
     "mtu": null,
     "type": "ovs",
     "vif_id": "c4d1b119-9cd9-4826-a55d-724d5acff037"
    },
    {
     "ethernet_mac_address": "fa:16:3e:30:ba:3c",
     "id": "tapf8603d47-73",
     "mtu": 9000,
     "type": "ovs",
     "vif_id": "f8603d47-7331-4e59-adec-335410cff0a7"
    }
   ],
   "networks": [
    {
     "id": "network0",
     "link": "tapc4d1b119-9c",
     "network_id": "3c7b83ed-b695-4d08-b8bf-7a3ef24a0cb7",
     "type": "ipv4_dhcp"
    },
    {
     "id": "network1",
     "ip_address": "10.21.3.193",
     "link": "tapf8603d47-73",
     "netmask": "255.255.0.0",
     "network_id": "12a61257-7a3d-49c4-b379-540b9e61b83e",
     "routes": [
      {
       "gateway": "10.21.255.254",
       "netmask": "255.255.255.0",
       "network": "10.19.4.0"
      }
     ],
     "services": [],
     "type": "ipv4"
    }
   ],
   "services": [
    {
     "address": "129.215.205.191",
     "type": "nameserver"
    }
GregBlow commented 7 months ago

image

GregBlow commented 7 months ago

where is it pulling 'fa:16:3e:30:ba:3c' from?

GregBlow commented 7 months ago

no return from openstack:

(openstack-config) [stack@sv-admin-0 openstack-config]$ openstack port list | grep fa:16:3e:30:ba:3c
(openstack-config) [stack@sv-admin-0 openstack-config]$
GregBlow commented 7 months ago

https://stackoverflow.com/questions/70671180/openstack-cloud-init-do-not-assign-proper-network-interface-to-instance

GregBlow commented 7 months ago

/etc/netplan/50-cloud-init.yaml

# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        enp3s0:
            dhcp4: true
            match:
                macaddress: fa:16:3e:37:b1:2b
            set-name: enp3s0
        enp4s0:
            addresses:
            - 10.21.3.193/16
            match:
                macaddress: fa:16:3e:30:ba:3c
            mtu: 9000
            nameservers:
                addresses:
                - 129.215.205.191
                - 129.215.166.13
                - 129.215.146.5
                search: []
            routes:
            -   to: 10.19.4.0/24
                via: 10.21.255.254
            set-name: enp4s0
GregBlow commented 7 months ago

after modifying 50-cloud-init.yml to reflect the proper macaddress for enp4s0 it works:

ubuntu@multinet-test4:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8942 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:37:b1:2b brd ff:ff:ff:ff:ff:ff
    inet 10.71.0.196/24 metric 100 brd 10.71.0.255 scope global dynamic enp3s0
       valid_lft 43174sec preferred_lft 43174sec
    inet6 fe80::f816:3eff:fe37:b12b/64 scope link
       valid_lft forever preferred_lft forever
3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:0b:90:62 brd ff:ff:ff:ff:ff:ff
    inet 10.21.3.193/16 brd 10.21.255.255 scope global enp4s0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe0b:9062/64 scope link
       valid_lft forever preferred_lft forever
GregBlow commented 7 months ago

running sudo cloud-init clean --logs and rebooting is sufficient to resolve the immediate issue taht cloud-init does not configure interfaces correctly on boot. However, this suggests that cloud-init is not updating correctly, and it resets the key fingerprint, which is inconvenient.

Issue needs progressing by determining why cloud-init is not updating correctly.

GregBlow commented 6 months ago

Appears to be intended function:

https://cloudinit.readthedocs.io/en/latest/howto/rerun_cloud_init.html

Most cloud-init configuration is only applied to the system once. This means that simply rebooting the system will only re-run a subset of cloud-init. Cloud-init provides two different options for re-running cloud-init for debugging purposes.

tms-epcc commented 1 month ago

@GregBlow to check with @gpfrancis before closing