Open askfongjojo opened 2 months ago
Also took a look at the crucible downstairs around that time and didn't see anything out of the ordinary.
root@oxz_switch1:~# omdb db disks list | grep 8a713b03-e6a2-4659-90f1-d3fce8a071ef
master-027138b2-8f92-4d70-8b31-59a-e9de02 e45db0db-f9d8-47e6-9be1-3aef84da229f 10 GiB attached 8a713b03-e6a2-4659-90f1-d3fce8a071ef
root@oxz_switch1:~# omdb db disks info e45db0db-f9d8-47e6-9be1-3aef84da229f
HOST_SERIAL DISK_NAME INSTANCE_NAME PROPOLIS_ZONE VOLUME_ID DISK_STATE
BRM42220077 master-027138b2-8f92-4d70-8b31-59a-e9de02 master oxz_propolis-server_cdcb2537-a5ee-4e9d-97d8-d8a86b57d2dd 10fac653-2a79-415b-bdf6-51703b502ac9 attached
HOST_SERIAL REGION ZONE PHYSICAL_DISK
BRM42220035 b5b90b8d-53b7-410a-81f2-f99bc9c41c2b oxz_crucible_3e598962-ef8c-4cb6-bdfe-ec8563939d6a 44af7b26-cb4f-4a38-bbbf-700159e4252a
BRM42220013 ee2c2eda-9fc7-4e25-83f9-773fd33d478b oxz_crucible_475140fa-a5dc-4ec1-876d-751c48adfc37 d5345689-fdc8-4009-bef8-978d2492f302
BRM42220089 60bb712a-798b-43e2-87e2-dae6a68f94fa oxz_crucible_de376149-aa45-4660-9ae6-15e8ba4a4233 517b2733-1261-4a3c-813b-f92a7b5eec66
VCR from volume ID 10fac653-2a79-415b-bdf6-51703b502ac9
ID BS SUB_VOLUMES READ_ONLY_PARENT
e45db0db-f9d8-47e6-9be1-3aef84da229f 512 1 false
SUB VOLUME 0
ID BS BPE EC GEN READ_ONLY
e45db0db-f9d8-47e6-9be1-3aef84da229f 512 131072 160 3 false
[fd00:1122:3344:10c::5]:19000
[fd00:1122:3344:111::a]:19011
[fd00:1122:3344:108::4]:19006
BRM42220089 # cat /pool/ext/dbfdc981-1b81-4d7d-9449-9530890b199a/crypt/debug/oxz_crucible_de376149-aa45-4660-9ae6-15e8ba4a4233/oxide-crucible-downstairs:downstairs-60bb712a-798b-43e2-87e2-dae6a68f94fa.log.1723488094 | looker
18:38:53.998Z INFO crucible: upstairs disconnected
id = 0
task = recv
18:38:53.998Z WARN crucible: recv_task sending ConnectionClosed
id = 0
task = recv
18:38:53.998Z INFO crucible: connection closed; disconnection
18:38:53.998Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) removed, 0 jobs left
18:38:53.998Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) was previously active, clearing
18:39:39.434Z INFO crucible: accepted connection from [fd00:1122:3344:116::1:a6]:56076
task = main
18:39:39.434Z INFO crucible: connection ([fd00:1122:3344:116::1:a6]:56076): tasks spawned
id = 1
18:39:39.434Z INFO crucible: Connection request from e45db0db-f9d8-47e6-9be1-3aef84da229f with version 8
18:39:39.434Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } connected, version 8
18:39:39.434Z INFO crucible: UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } is now active (read-write)
18:39:39.443Z INFO crucible: Current flush_numbers [0..12]: [276, 306, 335, 364, 394, 423, 452, 482, 511, 540, 570, 599]
18:41:34.066Z INFO crucible: upstairs disconnected
id = 1
task = recv
18:41:34.066Z WARN crucible: recv_task sending ConnectionClosed
id = 1
task = recv
18:41:34.066Z INFO crucible: connection closed; disconnection
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) removed, 0 jobs left
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) was previously active, clearing
BRM42220013 # cat /pool/ext/a1856609-b49f-4d30-a0e3-2e8154dbcdbf/crypt/debug/oxz_crucible_475140fa-a5dc-4ec1-876d-751c48adfc37/oxide-crucible-downstairs:downstairs-ee2c2eda-9fc7-4e25-83f9-773fd33d478b.log.1723488094 | looker
18:38:53.998Z INFO crucible: upstairs disconnected
id = 0
task = recv
18:38:53.998Z WARN crucible: recv_task sending ConnectionClosed
id = 0
task = recv
18:38:53.998Z INFO crucible: connection closed; disconnection
18:38:53.998Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) removed, 0 jobs left
18:38:53.998Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) was previously active, clearing
18:39:39.434Z INFO crucible: accepted connection from [fd00:1122:3344:116::1:a6]:55290
task = main
18:39:39.434Z INFO crucible: connection ([fd00:1122:3344:116::1:a6]:55290): tasks spawned
id = 1
18:39:39.434Z INFO crucible: Connection request from e45db0db-f9d8-47e6-9be1-3aef84da229f with version 8
18:39:39.434Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } connected, version 8
18:39:39.434Z INFO crucible: UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } is now active (read-write)
18:39:39.486Z INFO crucible: Current flush_numbers [0..12]: [276, 306, 335, 364, 394, 423, 452, 482, 511, 540, 570, 599]
18:41:34.066Z INFO crucible: upstairs disconnected
id = 1
task = recv
18:41:34.066Z WARN crucible: recv_task sending ConnectionClosed
id = 1
task = recv
18:41:34.066Z INFO crucible: connection closed; disconnection
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) removed, 0 jobs left
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) was previously active, clearing
BRM42220035 # cat /pool/ext/95220093-e3b8-4f7f-9f5a-cb32cb75180a/crypt/debug/oxz_crucible_3e598962-ef8c-4cb6-bdfe-ec8563939d6a/oxide-crucible-downstairs:downstairs-b5b90b8d-53b7-410a-81f2-f99bc9c41c2b.log.1723488094 | looker
18:38:54.001Z INFO crucible: upstairs disconnected
id = 0
task = recv
18:38:54.001Z WARN crucible: recv_task sending ConnectionClosed
id = 0
task = recv
18:38:54.001Z INFO crucible: connection closed; disconnection
18:38:54.001Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) removed, 0 jobs left
18:38:54.001Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: 06191f6e-dd09-4041-ae1c-ffb9d1e94b60, gen: 1 } (ConnectionId(0)) was previously active, clearing
18:39:39.434Z INFO crucible: accepted connection from [fd00:1122:3344:116::1:a6]:44735
task = main
18:39:39.434Z INFO crucible: connection ([fd00:1122:3344:116::1:a6]:44735): tasks spawned
id = 1
18:39:39.434Z INFO crucible: Connection request from e45db0db-f9d8-47e6-9be1-3aef84da229f with version 8
18:39:39.434Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } connected, version 8
18:39:39.434Z INFO crucible: UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } is now active (read-write)
18:39:39.444Z INFO crucible: Current flush_numbers [0..12]: [276, 306, 335, 364, 394, 423, 452, 482, 511, 540, 570, 599]
18:41:34.066Z INFO crucible: upstairs disconnected
id = 1
task = recv
18:41:34.066Z WARN crucible: recv_task sending ConnectionClosed
id = 1
task = recv
18:41:34.066Z INFO crucible: connection closed; disconnection
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) removed, 0 jobs left
18:41:34.066Z INFO crucible: upstairs UpstairsConnection { upstairs_id: e45db0db-f9d8-47e6-9be1-3aef84da229f, session_id: ef66202d-9830-4a91-9bf4-c22be271f29d, gen: 2 } (ConnectionId(1)) was previously active, clearing
Looking at the opcode(s), it would seem that the guest vCPU jumped off into space. This lends more credence to the necessity of #335.
@augustuswm found a VM that had a propolis server panic. Here are the propolis log lines at the time:
Both the core file and propolis log can be found in
/staff/rack3/BRM42220077/2024-08-29
(system-illumos-propolis-server:default.log.1723488297
andcore.oxz_propolis-server_cdcb2537-a5ee-4e9d-97d8-d8a86b57d2dd.propolis-server.23389.1723488093
.@leftwo got the stack from the core file: