dogfood: dendrite service only listening on localhost

jgallagher commented 1 year ago

During dogfood reinstalls both last week and today, we got stuck when a service was attempting to talk to dendrite on the bootstrap network, but dendrite in the switch zone was configured to listen only on localhost. In both cases we manually fixed this by changing its config/address property from [::1]:12224 to [::]:12224, but (a) we need this to work automatically and (b) it isn't clear why the address is being set to only localhost.

From skimming the code where this happens, it looks like dendrite should be listening on all addresses relevant to the switch zone. A quick and dirty fix would be to change this to just always set the address to [::]:{DENDRITE_PORT}, but it would be better to investigate and understand why we're only seeing localhost in practice.

bnaecker commented 1 year ago

So one question I have is whether Dendrite happens to be running already when we set this SMF property. Currently, Dendrite accepts SMF refreshes by handling SIGUSR2. But I definitely ran into issues where the signal handler is not installed early enough in Dendrite's startup sequence. I did some work to address that, but it's entirely possible that we can still actually miss refreshes, given the way Omicron is actually starting the service. I experimented with this part of the code when adding the MAC address setup, so I'd be happy to help debug on that side, if we can determine that Dendrite needs to be fixed to address this rather than Omicron.

jgallagher commented 1 year ago

That is a more specific version of a question we asked on the call, thanks! Maybe a thing to start with is to add logging to sled-agent where it configures dendrite, and see if what sled-agent is configuring is different from what dendrite is using?

bnaecker commented 1 year ago

That'd be great yeah. We can see in the Dendrite logs as well (1) what configuration it gets at startup or at an SMF refresh, and (2) when it handles SIGUSR1. (I incorrectly said SIGUSR2 above.)

jgallagher commented 1 year ago

We have more logging in sled-agent than I realized. There are two places in sled-agent/src/services.rs where we configure dendrite (why 2? not sure). The first is here where we add all addresses present in request.zone.addresses. Above that we log any addresses other than localhost. From today's dogfood logs:

00:06:32.508Z INFO SledAgent (BootstrapAgent): Zone booting
    zone = oxz_switch
00:06:40.365Z INFO SledAgent (BootstrapAgent): Ensuring bootstrap address fdb0:a840:2504:11a::2 exists in switch zone
00:06:40.365Z INFO SledAgent (BootstrapAgent): Adding bootstrap address
    zone = oxz_switch
00:06:48.313Z INFO SledAgent (BootstrapAgent): Forwarding bootstrap traffic via oxBootstrap0 to fe80::8:20ff:fecb:7a4e
00:06:48.366Z INFO SledAgent (BootstrapAgent): GZ addresses: []
00:06:48.603Z INFO SledAgent (BootstrapAgent): Setting up dendrite service

Any non-localhost addresses would've been logged in between Forwarding bootstrap traffic... and GZ addresses: ..., so I think sled-agent did not even try to give dendrite the bootstrap address.

The second place we configure dendrite is here, and again we log the non-localhost addresses above. From today's logs:

02:15:09.070Z INFO SledAgent (BootstrapAgent): Ensuring address fd00:1122:3344:10b::2 exists
02:15:09.070Z INFO SledAgent (BootstrapAgent): Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:10b::2, prefix: 64 }))
    zone = oxz_switch
02:15:09.123Z INFO SledAgent: Sending prefix to ddmd for advertisement
    DdmAdminClient = [::1]:8000
    prefix = Ipv6Prefix { addr: fd00:1122:3344:10b::, len: 64 }
02:15:09.123Z INFO SledAgent (sprockets (BootstrapAgent)): Connection closed
    remote_addr = [fdb0:a840:2504:11a::1]:60035
02:15:09.123Z INFO SledAgent (BootstrapAgentRssHandler): Peer agent initialized
    peer = [fdb0:a840:2504:11a::1]:12346
02:15:09.123Z INFO SledAgent (BootstrapAgentRssHandler): Initialized sled agent
    target_sled = [fdb0:a840:2504:11a::1]:12346
02:15:09.123Z INFO SledAgent: failed to notify nexus about sled agent
    error = proto error: io error: No route to host (os error 148)
    sled_id = 96c409e3-a150-46de-9915-25b839ba2508
02:15:09.256Z INFO SledAgent: contacting server nexus, registering sled
    baseboard = Baseboard { part_number: "913-0000019", revision: 6, serial_number: "BRM42220086" }
    id = 96c409e3-a150-46de-9915-25b839ba2508
    sled_id = 96c409e3-a150-46de-9915-25b839ba2508
02:15:09.410Z INFO SledAgent: contacting server nexus, registering sled
    baseboard = Baseboard { part_number: "913-0000019", revision: 6, serial_number: "BRM42220086" }
    id = 96c409e3-a150-46de-9915-25b839ba2508
    sled_id = 96c409e3-a150-46de-9915-25b839ba2508
02:15:09.594Z INFO SledAgent: contacting server nexus, registering sled
    baseboard = Baseboard { part_number: "913-0000019", revision: 6, serial_number: "BRM42220086" }
    id = 96c409e3-a150-46de-9915-25b839ba2508
    sled_id = 96c409e3-a150-46de-9915-25b839ba2508
02:15:09.801Z INFO SledAgent (BootstrapAgent): Ensuring address fd00:1122:3344:10b::2 exists - OK
02:15:09.846Z INFO SledAgent (BootstrapAgent): configuring dendrite zone

Critically this is after RSS, so too late to matter for the problems we were having. However, I think (?) in this case we should have configured dendrite with the bootstrap address. Looking at the dendrite logs, I see the initial config with just localhost within 1 second of the first set of sled-agent logs above:

00:06:49.231Z INFO dpd: dpd config: Config {
        log_file: None,
        log_format: Json,
        port_config: Some(
            "/opt/oxide/dendrite/misc/sidecar_config.toml",
        ),
        mac_base: None,
        listen_addresses: [
            [::1]:12224,
        ],
        dns_servers: [],
        rack_id: None,
        sled_id: None,
        asic_config: AsicConfig {
            devpath: Some(
                "/dev/tofino/1",
            ),
            xcvr_iface: Some(
                "sidecar0",
            ),
            board_rev: "b",
        },
    }

I do not see a similar config line around 02:15; several minutes later we have this one, which is almost certainly when @citrus-it changed the listen address by hand:

02:30:22.837Z INFO dpd: refreshed config: Config {
        log_file: None,
        log_format: Json,
        port_config: Some(
            "/opt/oxide/dendrite/misc/sidecar_config.toml",
        ),
        mac_base: Some(
            a8:40:25:05:04:02,
        ),
        listen_addresses: [
            [::]:12224,
        ],
        dns_servers: [],
        rack_id: Some(
            a65f53ee-4537-4ad6-bf4f-29dbf7ed0143,
        ),
        sled_id: Some(
            96c409e3-a150-46de-9915-25b839ba2508,
        ),
        asic_config: AsicConfig {
            devpath: Some(
                "/dev/tofino/1",
            ),
            xcvr_iface: Some(
                "sidecar0",
            ),
            board_rev: "b",
        },
    }

jgallagher commented 1 year ago

I think that leaves me with two questions:

Should sled-agent include the bootstrap address in the first config?
Why didn't dendrite see the change around 2:15?

bnaecker commented 1 year ago

Should sled-agent include the bootstrap address in the first config?

I'm not sure, but I don't think we want to do that. Anything else running in the switch zone could talk to dpd via ::1, which I think encompasses wicketd, for example. Anything else would use the underlay address, not the bootstrap, right?

Why didn't dendrite see the change around 2:15?

I don't know. Do you have the log file available? Like I said, we don't start actually listening to SIGUSR1 as a way to reload the configuration until pretty late in the startup sequence. Until that time, we'll either restart dpd or just ignore it, depending where we are in the code.

jgallagher commented 1 year ago

I'm not sure, but I don't think we want to do that. Anything else running in the switch zone could talk to dpd via ::1, which I think encompasses wicketd, for example. Anything else would use the underlay address, not the bootstrap, right?

I don't think this is right, at least based on what we were seeing in dogfood bringup. Something was trying to talk to dpd on the bootstrap network, and its inability to prevented us from making progress. (@citrus-it / @dancrossnyc / @rcgoodfellow do you remember what this was?)

rcgoodfellow commented 1 year ago

I believe there were requests coming in from bootstrap agents to set up NAT table entries for service zones connecting to the world through OPTE.

bnaecker commented 1 year ago

Ok, I'm probably missing information about whether something should be talking over bootstrap. And I'm definitely not trying to say that no existing software does, only my understanding about what should happen.

In any case, I don't understand why we're not getting the SMF refresh around 02:15. At that time, it looks like dpd is in a steady-state, doing what I expect: fielding requests from tfportd, talking to the transceivers, etc. I was expecting it to have just recently restarted, in which case it's possible that dpd ignores SMF refreshes. The mechanism for requesting refreshes relies on signals converted into Tokio messages. I don't see any obvious way that would fail to receive a refresh, but signals make me nervous in any case.

dancrossnyc commented 1 year ago

I believe there were requests coming in from bootstrap agents to set up NAT table entries for service zones connecting to the world through OPTE.

Exactly this: boundary NTP services on a sled were trying to set up NAT over the bootstrap network and getting "Connection refused" errors because dpd was only listening on localhost.

bnaecker commented 1 year ago

It looks like an earlier SMF call is failing. John and I found this line in the sled agent log, which appears to show that the delpropvalue command is failing:

02:15:10.006Z WARN SledAgent: Failed to activate switch: Failed to issue SMF command: Failed to do 'del config/address smf property value' by running command in zone: Error running command in zone 'oxz_switch': Command [/usr/sbin/zlogin oxz_switch /usr/sbin/svccfg -s svc:/system/illumos/dendrite delpropvalue config/address *] executed and failed with status: exit status: 1  stdout:   stderr: svccfg: Syntax error.

We are looking into why that is.

bnaecker commented 1 year ago

@dancrossnyc @citrus-it I'm not familiar with the NTP code. Could you point me where the boundary NTP daemons are getting that bootstrap address from?

citrus-it commented 1 year ago

It's not the NTP servers, but rather that sled agent is setting up connectivity for them as part of bringing them up. The address seems to be coming from DNS (I'm not familiar with this part either) - see https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/services.rs#L716-L718 and the bit just below

bnaecker commented 1 year ago

Thanks @citrus-it, really helpful. I'm adding some notes here.

As Andy pointed out, it appears that the sled-agent is resolving an address for dpd via DNS:

https://github.com/oxidecomputer/omicron/blob/0ca28dceef56f915197e8b3f3e853b90e35929f8/sled-agent/src/services.rs#L716-L718

It's using that to make a client to dpd right below. It seems that the failing requests then are this bit of the sled-agent, trying to set up the NAT entries for mapping the boundary NTP servers' external IP addresses to the hosting sled's underlay address:

https://github.com/oxidecomputer/omicron/blob/0ca28dceef56f915197e8b3f3e853b90e35929f8/sled-agent/src/services.rs#L773-L784

I'm not sure I follow this bit from the earlier discussion in this thread, but this all seems to imply that the sled agent is inserting a DNS entry resolving dpd to the bootstrap IPv6 address. Regardless of whether that's what we should do (I'll come back to this question), we never actually tell dpd to listen on that address. When we configure the service itself, we tell it first to listen on [::1]:12224, and then eventually I think on the underlay address. Note that this second step actually failed due to #3216.

But even if it actually succeeded, there seems to be a mismatch between DNS and what we tell dpd to listen on. In the former, we seem to have an entry for the bootstrap address, but we never tell dpd to listen there. Instead we ask it to listen on the underlay. So where do the DNS addresses come from?

It looks like this is where we start to setup the service plan. In that, we add DNS entries for dpd specifically on Scrimlets:

https://github.com/oxidecomputer/omicron/blob/0ca28dceef56f915197e8b3f3e853b90e35929f8/sled-agent/src/rack_setup/plan/service.rs#L268-L281

To be clear though, that get_switch_zone_address() function returns an address in the sled's underlay subnet:

https://github.com/oxidecomputer/omicron/blob/0ca28dceef56f915197e8b3f3e853b90e35929f8/common/src/address.rs#L236-L240

At the end of the day, this gets serialized into a configuration file and stored durably, here:

https://github.com/oxidecomputer/omicron/blob/0ca28dceef56f915197e8b3f3e853b90e35929f8/sled-agent/src/rack_setup/plan/service.rs#L574-L586

Looking at one such file, I see no references at all to the bootstrap address:

BRM42220086 # grep -c "fdb0" /pool/int/60947577-2f43-4f23-ba73-119bca0a1ddd/config/rss-service-plan.toml
0

Which all jives with what I can read from the code: we never insert the bootstrap address into DNS, and also never provide dpd with that address to listen on. This all sounds self-consistent to me.

As I said, I think this is what we want. There is no particular reason for dpd to listen on the bootstrap address. The sled agents will autonomously establish the underlay network, at which point dpd can start to listen it. (This is what currently happens, modulo the failure in #3216.) If some other service, such as wicketd needs to ultimately talk to dpd in order to configure other aspects of the system, I think that can all be done via ::1, since they both run in the switch zone. I should be clear that I may be wrong here, and that this is just my current understanding.

So given all this, looking back at the actual issue thread, I see no reference to the address that we attempted to reach when we got ECONNREFUSED. Does anyone have the actual logs showing that error, and which address was not reachable? @jgallagher @dancrossnyc or @citrus-it ?

rcgoodfellow commented 1 year ago

The logs I can currently dig up from gc17 show failures to set up NAT entries over the underlay network.

00:36:29.513Z INFO SledAgent (BootstrapAgent): creating NAT entry for service
    service = ServiceZoneService { id: 8547c8a6-1f38-43c0-8e5f-8ccb798191d6, details: BoundaryNtp { ntp_servers: ["ntp.eng.oxide.computer"], dns_servers: ["1.1.1.1", "9.9.9.9"], domain: None, nic: NetworkInterface { id: 34d56f71-b198-4544-823f-1964ac8830e7, kind: Service { id: 8547c8a6-1f38-43c0-8e5f-8ccb798191d6 }, name: Name("ntp-8547c8a6-1f38-43c0-8e5f-8ccb798191d6"), ip: 172.30.2.6, mac: MacAddr(MacAddr6([168, 64, 37, 255, 143, 34])), subnet: V4(Ipv4Net(Ipv4Network { addr: 172.30.2.0, prefix: 24 })), vni: Vni(100), primary: true, slot: 0 }, snat_cfg: SourceNatConfig { ip: 172.20.26.4, first_port: 16384, last_port: 32767 } } }

00:36:29.513Z WARN SledAgent (BootstrapAgent): failed to create NAT entry for service
    error = Communication Error: error sending request for url (http://[fd00:1122:3344:10b::2]:12224/nat/ipv4/172.20.26.4/16384/32767): error trying to connect: tcp connect error: Connection refused (os error 146)
    service = ServiceZoneService { id: 8547c8a6-1f38-43c0-8e5f-8ccb798191d6, details: BoundaryNtp { ntp_servers: ["ntp.eng.oxide.computer"], dns_servers: ["1.1.1.1", "9.9.9.9"], domain: None, nic: NetworkInterface { id: 34d56f71-b198-4544-823f-1964ac8830e7, kind: Service { id: 8547c8a6-1f38-43c0-8e5f-8ccb798191d6 }, name: Name("ntp-8547c8a6-1f38-43c0-8e5f-8ccb798191d6"), ip: 172.30.2.6, mac: MacAddr(MacAddr6([168, 64, 37, 255, 143, 34])), subnet: V4(Ipv4Net(Ipv4Network { addr: 172.30.2.0, prefix: 24 })), vni: Vni(100), primary: true, slot: 0 }, snat_cfg: SourceNatConfig { ip: 172.20.26.4, first_port: 16384, last_port: 32767 } } }

I thought I saw attempts being made on the bootstrap network, but I may be misremembering. A failure to set the underlay address for dpd through SMF would also result in these errors.

bnaecker commented 1 year ago

Thanks @rcgoodfellow that's helpful. That all seems consistent with the real issue being #3216, causing the addition of the underlay address to fail. I see no code which suggests that we're intending to direct dpd to listen on the bootstrap network. I'll leave this open in case we see accesses to on the bootstrap network fail when we next run things on the dogfood rack.

bnaecker commented 1 year ago

I believe this should be resolved by #3316, but I've not yet had a chance to test.

rcgoodfellow commented 1 year ago

This has not happened across many dogfood cycles. Closing as fixed by #3316.

oxidecomputer / omicron

dogfood: dendrite service only listening on localhost #3199