oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
239 stars 34 forks source link

Crucible datasets remain after disks are deleted. #1313

Open leftwo opened 2 years ago

leftwo commented 2 years ago

Crucible appears to suffer the same fate

Originally posted by @leftwo in https://github.com/oxidecomputer/omicron/issues/1119#issuecomment-1170554806

leftwo commented 2 years ago

On sock, after an uninstall, there are still crucible regions. You can see these from zfs list:

alan@sock:crucible$ zfs list -o name | grep crucible
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible/regions
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible/regions/439a26ff-e6dc-4794-a8c8-2ace6d175616
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions/12ac64e4-c489-4e4b-8272-5a40847bdb28
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions/e0f5e6a3-80f5-4892-9bfa-16fd09ea1e7c
rpool/data/crucible
rpool/zone/oxz_crucible_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
rpool/zone/oxz_crucible_oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03
rpool/zone/oxz_crucible_oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03

(Anything below oxp_.../crucible/regions/)

After ./tools/create_virtual_hardware.sh is run, the crucible zpools are created:

alan@sock:omicron$ zpool list -o name,size,alloc,free
NAME                                       SIZE  ALLOC   FREE
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b  49.5G  9.79G  39.7G
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03  49.5G  8.39G  41.1G
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03  49.5G  8.39G  41.1G
rpool                                      952G   452G   500G

However, if ./tools/destroy_virtual_hardware.sh is not run, any regions (disks) created inside these pools will remain, even across install/uninstall of omicron.

I believe this is the desired behavior (to allow upgrades), but if one is doing development and testing, be away that old regions may come alive even if there is no upstairs for them to connect to. This can result in crucible-downstairs processes being started, and any disk space these regions are using will not be available.

leftwo commented 1 year ago

I believe the issue described here is fixed, but there is one final issue I want to finish before I'm ready to call this issue closed: https://github.com/oxidecomputer/crucible/issues/542

There is a small leak in the crucible agent that needs fixing. It takes some 4000 disk add/deletes before it uses up the ramdisk on sn21, but it's still a leak that needs to be fixed.

leftwo commented 1 year ago

Confirmed that regions are deleted when a disk is deleted.

Here is before:

gimlet-sn21 # zfs list | grep regions/
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543  4.84G  2.81T     4.84G  /data/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/f2da8fe9-9a4d-4e34-a308-37392c39ce77  86.6M  2.81T     86.6M  /data/regions/f2da8fe9-9a4d-4e34-a308-37392c39ce77
oxp_1bdae8d1-acde-4f44-bc9c-5b657e6f01d3/crucible/regions/f1408d10-e646-4b64-bff3-dc18e07e1cbc  86.6M  2.82T     86.6M  /data/regions/f1408d10-e646-4b64-bff3-dc18e07e1cbc
oxp_2ec1c158-3535-43c1-aca3-6d186487bbbc/crucible/regions/7bddd111-c6af-466f-9427-5e9a0151e76c  4.84G  2.81T     4.84G  /data/regions/7bddd111-c6af-466f-9427-5e9a0151e76c
oxp_4a2245f9-4f54-4a3d-86ae-103ae196959a/crucible/regions/20f4c642-85aa-455d-9033-aa881f495a94  34.3G  2.78T     34.3G  /data/regions/20f4c642-85aa-455d-9033-aa881f495a94
oxp_627cda87-085b-44af-a70e-d067599c3fe2/crucible/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61  34.3G  2.78T     34.3G  /data/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61
oxp_9f5a50c7-08ce-41f7-8efd-5d1323a1f070/crucible/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f  34.3G  2.78T     34.3G  /data/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/405213cc-3e0e-42e7-a078-4e75a6475915  86.6M  2.81T     86.6M  /data/regions/405213cc-3e0e-42e7-a078-4e75a6475915
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f  4.84G  2.81T     4.84G  /data/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f

Then after a delete:

gimlet-sn21 # zfs list | grep regions/        
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543  4.84G  2.81T     4.84G  /data/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543
oxp_2ec1c158-3535-43c1-aca3-6d186487bbbc/crucible/regions/7bddd111-c6af-466f-9427-5e9a0151e76c  4.84G  2.81T     4.84G  /data/regions/7bddd111-c6af-466f-9427-5e9a0151e76c
oxp_4a2245f9-4f54-4a3d-86ae-103ae196959a/crucible/regions/20f4c642-85aa-455d-9033-aa881f495a94  34.3G  2.78T     34.3G  /data/regions/20f4c642-85aa-455d-9033-aa881f495a94
oxp_627cda87-085b-44af-a70e-d067599c3fe2/crucible/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61  34.3G  2.78T     34.3G  /data/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61
oxp_9f5a50c7-08ce-41f7-8efd-5d1323a1f070/crucible/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f  34.3G  2.78T     34.3G  /data/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f  4.84G  2.81T     4.84G  /data/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f