oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
244 stars 38 forks source link

Need to review entries in `inv_dataset` table #6615

Open askfongjojo opened 1 day ago

askfongjojo commented 1 day ago

I see that there is an unusally large number of records in this table that has no time_deleted or state fields:

root@[fd00:1122:3344:109::3]:32221/omicron> \d inv_dataset
     column_name    | data_type | is_nullable | column_default | generation_expression |      indices       | is_hidden
--------------------+-----------+-------------+----------------+-----------------------+--------------------+------------
  inv_collection_id | UUID      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  sled_id           | UUID      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  id                | UUID      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  name              | STRING    |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  available         | INT8      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  used              | INT8      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  quota             | INT8      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  reservation       | INT8      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  compression       | STRING    |    false    | NULL           |                       | {inv_dataset_pkey} |   false
(9 rows)

root@[fd00:1122:3344:109::3]:32221/omicron> select count(*) from inv_dataset;
   count
------------
  35641826

root@[fd00:1122:3344:109::3]:32221/omicron> select sled_id, count(*) from inv_dataset group by sled_id;
                sled_id                |  count
---------------------------------------+----------
  f15774c1-b8e5-434f-a493-ec43f96cba06 | 3129834
  bd96ef7c-4941-4729-b6f7-5f47feecbc4b | 2054724
  71def415-55ad-46b4-ba88-3ca55d7fb287 | 3064761
  a2adea92-b56e-44fc-8a0d-7d63b5fd3b93 | 3015194
  dd83e75a-1edf-4aa1-89a0-cd8b2091a7cd | 3221035
  0c7011f7-a4bf-4daf-90cc-1c2410103300 | 2894176
  2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa | 2943800
  7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c | 2948692
  87c2c4fc-b0c7-4fef-a305-78f0ed265bbc | 3009928
  5f6720b8-8a31-45f8-8c94-8e699218f28b | 3131293
  b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88 | 3016263
  db183874-65b5-4263-a1c1-ddb2737ae0e9 | 3208511
(12 rows)

It may be worth confirming if this table is supposed to grow indefinitely. Other inv_ table seems to be pruned and kept to reasonable sizes (maybe related to https://github.com/oxidecomputer/omicron/issues/5305).

davepacheco commented 20 hours ago

Thanks for catching this!!

Yeah it seems pretty likely that this table is missing from the inventory delete code. We can certainly fix that problem but it's a little harder to fix existing systems. We could try something like:

DELETE FROM inv_dataset WHERE inv_collection_id NOT IN (SELECT id FROM inv_collection);

But that's a potentially huge transaction and it'd be safer to do it in batches. It looks like you can do something like this but I haven't tested it:

DELETE FROM inv_dataset WHERE inv_collection_id NOT IN (SELECT id FROM inv_collection) LIMIT 1000;

and we could run that in a loop until it produces no results.

We could:

I'd probably lean towards the omdb command. There's a related question of: is this too urgent to wait for the next upgrade?

askfongjojo commented 20 hours ago

The issue doesn't seem to affect the system negatively at the moment AFAICT (I found it only by chance). We can do the cleanup at the next customer rack update after fixing the inventory delete code.