oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
239 stars 34 forks source link

U.2 debug datasets are not mounted due to name conflicts #4203

Open bnaecker opened 10 months ago

bnaecker commented 10 months ago

While working on some zone bundle improvements, I noticed that the debug datasets we create on the U.2s, which are used for cores and archived logs, are not actually mounted.

When the sled-agent starts up, we create a hierarchy of datasets on each U.2. These are structured like <pool_name>/crypt, with child datasets underneath, such as zone or debug. The <pool_name>/crypt/zone datasets have further children under them, which are the root filesystems for each zone we launch in the control plane. As an example, from the current dogfood rack, we have:

BRM42220009 # zfs list -Ho name -r oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_0cfa0ed5-8ff3-459f-bf22-25cda4faf68a
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_2ee45a51-e813-40ac-92a1-b79e21b51310
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_428b2b5c-b962-4e36-9ef8-4fbd9f2b657e
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_4f73f6c1-99b6-41ee-9570-48a5a7af0f3d
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_5e307afc-678c-4b01-9101-40fb1a0a84b0
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_66298fe0-dc65-4a50-bfb6-5ce3feccea89
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_677ebc1d-048e-424c-8a34-6364a0510bd3
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_699ad227-6387-4acf-bb21-89cb00242143
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_c3723727-0480-4f29-878f-ad8cb786845a
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_e1328433-9194-4eed-993a-b57553200c0f
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_e54bb9b7-4ccd-4e58-a686-2ed68d58b905
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_ebe3de59-b867-470e-bb63-d93357ac5e7d
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/zone/oxz_propolis-server_fd81398c-2055-4352-8743-bdbf4d620213

The debug dataset is intended for cores, crash dumps, and also archived logs from the zones. Here is where it's supposed to be mounted:

BRM42220009 # zfs list -Ho name,mountpoint oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug    /pool/ext/d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug

And there are indeed directories there:

BRM42220009 # ls /pool/ext/d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug
global                                                oxz_crucible_2f294ca1-7a4f-468f-8966-2b7915804729     oxz_crucible_cf3b2d54-5e36-4c93-b44f-8bf36ac98071
oxz_clickhouse_aa646c82-c6d7-4d0c-8401-150130927759   oxz_crucible_5c8c244c-00dc-4b16-aa17-6d9eb4827fab     oxz_crucible_ee8bce67-8f8e-4221-97b0-85f1860d66d0
oxz_cockroachdb_a3628a56-6f85-43b5-be50-71d8f0e04877  oxz_crucible_6cec1d60-5c1a-4c1b-9632-2b4bc76bd37c     oxz_crucible_f65a6668-1aea-4deb-81ed-191fbe469328
oxz_crucible_04eef8aa-055c-42ab-bdb6-c982f63c9be0     oxz_crucible_7d5e942b-926c-442d-937a-76cc4aa72bf3     oxz_ntp_7529be1c-ca8b-441a-89aa-37166cc450df
oxz_crucible_1a77bd1d-4fd4-4d6c-a105-17f942d94ba6     oxz_crucible_8568c997-fbbb-46a8-8549-b78284530ffc

However, if we look at the dataset that those directories actually belong to, we see this:

BRM42220009 # zfs get -Ho name name /pool/ext/d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug/oxz_clickhouse_aa646c82-c6d7-4d0c-8401-150130927759
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt

I.e., they are in the base crypt dataset, not the expected one at crypt/debug. And in fact, those directories do not show up in /etc/mnttab:

BRM42220009 # grep "oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug" /etc/mnttab
BRM42220009 #

And just to double-check, they are in fact not mounted:

BRM42220009 # zfs list -Ho name,mounted | grep crypt/debug
oxp_0e485ad3-04e6-404b-b619-87d4fea9f5ae/crypt/debug    no
oxp_43efdd6d-7419-437a-a282-fc45bfafd042/crypt/debug    no
oxp_4c157f35-865d-4310-9d81-c6259cb69293/crypt/debug    no
oxp_62a4c68a-2073-42d0-8e49-01f5e8b90cd4/crypt/debug    no
oxp_845ff39a-3205-416f-8bda-e35829107c8a/crypt/debug    no
oxp_9b61d4b2-66f6-459f-86f4-13d0b8c5d6cf/crypt/debug    no
oxp_b252b176-3974-436a-915b-60382b21eb76/crypt/debug    no
oxp_b6bdfdaf-9c0d-4b74-926c-49ff3ed05562/crypt/debug    no
oxp_d0584f4a-20ba-436d-a75b-7709e80deb79/crypt/debug    no
oxp_fd82dcc7-00dd-4d01-826a-937a7d8238fb/crypt/debug    no

Also, we appear to still be archiving logs into those debug directories, in the crypt dataset:

BRM42220009 # grep "DumpSetup" $(svcs -L sled-agent) | looker | tail -10
18:45:40.645Z INFO SledAgent (StorageManager): Archiving 1 log files from oxz_clickhouse_aa646c82-c6d7-4d0c-8401-150130927759 zone
    file = sled-agent/src/storage/dump_setup.rs:612
18:45:43.103Z INFO SledAgent (StorageManager): Archiving 103 log files from oxz_propolis-server_699ad227-6387-4acf-bb21-89cb00242143 zone
    file = sled-agent/src/storage/dump_setup.rs:612
18:45:43.246Z INFO SledAgent (StorageManager): Archiving 103 log files from oxz_propolis-server_08b1679a-68a1-479a-b59c-96a88427e19f zone
    file = sled-agent/src/storage/dump_setup.rs:612
18:45:43.277Z INFO SledAgent (StorageManager): Archiving 103 log files from oxz_propolis-server_39579991-0ebc-411a-8057-3f3d73b422b1 zone
    file = sled-agent/src/storage/dump_setup.rs:612
18:45:43.309Z INFO SledAgent (StorageManager): Archiving 103 log files from oxz_propolis-server_b75865d6-f068-4ddc-b260-b417c5940ca2 zone
    file = sled-agent/src/storage/dump_setup.rs:612
BRM42220009 # date
Wed Oct  4 18:51:37 UTC 2023

And we can see there are many files in the archive directory implied by that last Propolis zone name:

BRM42220009 # ls -1 /pool/ext/**/crypt/debug/oxz_propolis-server_b75865d6-f068-4ddc-b260-b417c5940ca2 | grep -c propolis
132
BRM42220009 # find /pool/ext -type d -name oxz_propolis-server_b75865d6-f068-4ddc-b260-b417c5940ca2
/pool/ext/0e485ad3-04e6-404b-b619-87d4fea9f5ae/crypt/debug/oxz_propolis-server_b75865d6-f068-4ddc-b260-b417c5940ca2

There are also core files in those locations:

BRM42220009 # find /pool/ext -name "*core\.oxz_*" 2> /dev/null
/pool/ext/9b61d4b2-66f6-459f-86f4-13d0b8c5d6cf/crypt/debug/core.oxz_propolis-server_7cf0b20a-9f38-4518-a4df-4e60d2517685.propolis-server.17914.1693505891
^C

To summarize:

So, it appears that those directories have been created at some prior point, which prevents ZFS from automounting the .../crypt/debug dataset over them. It's not totally clear to me what the right path forward here is. Deleting those directories is needed for them to be automounted, but we also don't want to necessarily blow away any of the existing debug data.

jclulow commented 10 months ago

If you want to preserve the contents, I think the basic pattern is:

NB: You cannot merely rename the files from debug-old to debug, unfortunately, because it is a cross-file system operation, and links cannot span file systems.

bnaecker commented 10 months ago

I don't think this has the same underlying cause as #4269. In that case, the dataset rpool/zone was created with -o zoned=on, which prevented it from being mounted. Looking at zpool history on the dogfood rack, where we see this name conflict, I don't see that option being set on the dataset creation:

BRM42220051 # zfs get -Hpo value name /pool/ext/e6d2fe1d-c74d-40cd-8fae-bc7d06bdaac8/crypt/debug/
oxp_e6d2fe1d-c74d-40cd-8fae-bc7d06bdaac8/crypt
BRM42220051 # zpool history oxp_e6d2fe1d-c74d-40cd-8fae-bc7d06bdaac8 | grep 'zfs create .*/debug'
1986-12-28.16:21:37 zfs create -o mountpoint=/pool/ext/e6d2fe1d-c74d-40cd-8fae-bc7d06bdaac8/crypt/debug oxp_e6d2fe1d-c74d-40cd-8fae-bc7d06bdaac8/crypt/debug
BRM42220051 #

So this dataset is not zoned, which is good.