oxidecomputer / crucible

A storage service.
Mozilla Public License 2.0
175 stars 18 forks source link

get-lr-state.sh and get-ds-state.sh don't handle sessions correctly. #1557

Open leftwo opened 1 day ago

leftwo commented 1 day ago

The scripts get-*-state.sh don't handle multiple sessions per propolis servers.

leftwo commented 1 day ago

On a system with a failed ds:

root@oxz_switch1:~# pilot host exec -c "zoneadm list | grep propolis | wc -l && /opt/oxide/crucible_dtrace/get-lr-state.sh" 14  
14  BRM42220036        ok: 17
oxz_propolis-server_05467c6c-dfbd-493c-af58-e0474ec07c03  17028 0 0 0 0 0 0
oxz_propolis-server_19e1447b-0598-4d12-bc20-eac525f4afa5  17874 0 0 0 0 0 0
oxz_propolis-server_1bf0a482-9a72-4a5c-9e91-bf3507f3ac48  18640 0 0 0 0 0 0
oxz_propolis-server_a9dbf096-8f86-4cfc-a1a8-3e721ae99ffa  19442 0 0 0 0 0 0
oxz_propolis-server_9dd96d16-cfa1-4fe0-a754-d6687bd97a33  20279 0 0 0 0 0 0
oxz_propolis-server_26ff04d3-1af9-40bd-a0f0-bf09def07f02  22032 0 0 0 0 0 0
oxz_propolis-server_164b1113-ecc0-443c-8b89-ee177795ffc6  26146 0 0 0 0 0 0
oxz_propolis-server_53c6bbf3-3b23-44d4-a7ff-5f1eebfdec57  28766 0 0 0 0 0 0
oxz_propolis-server_217839dc-8f00-4bd9-a084-aeeef45f590f   1144 0 0 0 0 0 0
oxz_propolis-server_6257441b-d947-46f4-8b39-039245baf67b   2459 0 0 0 0 0 0
oxz_propolis-server_1956f249-b23b-491e-a20f-3d8b1f8c9684   3728 0 0 0 0 0 0
oxz_propolis-server_d873529b-7b0e-4363-b4f8-fa2d1c7d6999  13305 0 0 0 0 0 0
oxz_propolis-server_5aa27e0b-cc4d-4424-afc9-3295f4324ad0  14577 0 0 0 0 0 0
oxz_propolis-server_6f0c7ad3-1866-4419-92b8-1fc67dcffcad  18364 0 0 0 0 0 0
oxz_propolis-server_950e15e8-105a-4ed0-ac9b-5c097c7102ba  21797 0 0 0 0 0 0
oxz_propolis-server_f1089ab1-1f74-4f39-a520-6e723b986a54  22847 0 0 0 0 0 0
oxz_propolis-server_33191a03-e3d5-4cb2-864c-d46b5aab34e2  23803 0 0 0 0 0 0

Then, just info:

BRM42220036 # dtrace -s /opt/oxide/crucible_dtrace/sled_upstairs_info.d | grep -v "0          0      0     0     0      0     0     0      0     0     0      0     0     0      0     0     0"
23803 911fad69            active            active            active     0    92    119248     0          0      0     0     0     92    92    92      0     0     0      0     0     0      0     0     0
21797 961e2713            active            active            active     0    96    122979     0     856064      1     1     1     95    95    95      0     0     0      0     0     0      0     0     0
 1144 9dbc3fec       live_repair            active            active     2  2179    137422     0          0      2     1     1   2177  2178  2178      0     0     0    577     0     0    356     0     0
22847 0d2d0f72            active            active            active     1   279     63119    35   72036352     59    59   138    220   220   141      0     0     0      0     0     0      0     0     0
  PID  SESSION        DS STATE 0        DS STATE 1        DS STATE 2   UPW   DSW  NEXT_JOB BAKPR   WRITE_BO    IP0   IP1   IP2     D0    D1    D2     S0    S1    S2    ER0   ER1   ER2    EC0   EC1   EC2
23803 911fad69            active            active            active     0   112    119742     0     933888      1     1     1    111   111   111      0     0     0      0     0     0      0     0     0
21797 961e2713            active            active            active     0    98    123475     0    1048576      2     2     2     96    96    96      0     0     0      0     0     0      0     0     0
 1144 9dbc3fec       live_repair            active            active     2  2187    137430     0          0      2     1     1   2185  2186  2186      0     0     0    579     0     0    356     0     0
22847 0d2d0f72            active            active            active     0   270     63497     0   49283072     22    46    95    248   224   175      0     0     0      0     0     0      0     0     0
  PID  SESSION        DS STATE 0        DS STATE 1        DS STATE 2   UPW   DSW  NEXT_JOB BAKPR   WRITE_BO    IP0   IP1   IP2     D0    D1    D2     S0    S1    S2    ER0   ER1   ER2    EC0   EC1   EC2
23803 911fad69            active            active            active     0   118    120248     0    1982464      1     3     1    117   115   117      0     0     0      0     0     0      0     0     0
21797 961e2713            active            active            active     0   102    123975     0     856064      1     1     1    101   101   101      0     0     0      0     0     0      0     0     0
 1144 9dbc3fec       live_repair            active            active     2  2195    137438     0          0      2     1     1   2193  2194  2194      0     0     0    581     0     0    356     0     0
22847 0d2d0f72            active            active            active     0   231     63855     2   57356288     11    28   110    220   203   121      0     0     0      0     0     0      0     0     0
  PID  SESSION        DS STATE 0        DS STATE 1        DS STATE 2   UPW   DSW  NEXT_JOB BAKPR   WRITE_BO    IP0   IP1   IP2     D0    D1    D2     S0    S1    S2    ER0   ER1   ER2    EC0   EC1   EC2

The former is not showing that one of the DS in in repair.

leftwo commented 22 hours ago

https://github.com/oxidecomputer/crucible/pull/1560