oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
241 stars 36 forks source link

CI should use a nextest binary that contains symbols #5179

Open jclulow opened 6 months ago

jclulow commented 6 months ago

While trying to debug a CI issue, I was looking at a stuck cargo-nextest process, and to my horror it did not have any symbols for the Rust program text:

root@ip-10-150-1-69:~# pstack 3879 | demangle
3879:   /home/build/.cargo/bin/cargo-nextest nextest run --profile ci --locked
--------------------- thread# 1 / lwp# 1 ---------------------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2c13c10, 2ddf970, 0) + 55
 fffffc7fef16d1ea __cond_wait (2c13c10, 2ddf970) + ba
 fffffc7fef16d22e cond_wait (2c13c10, 2ddf970) + 2e
 fffffc7fef16d275 pthread_cond_wait (2c13c10, 2ddf970) + 15
 0000000000a96108 ???????? ()
 0000000000a95f17 ???????? ()
 0000000000a9682c ???????? ()
 00000000004c4f2c ???????? ()
 00000000004b988d ???????? ()
 000000000049be4e ???????? ()
 000000000048e9d4 ???????? ()
 000000000048d80a ???????? ()
 0000000000439000 ???????? ()
 0000000000434916 ???????? ()
 0000000000434be5 ???????? ()
 0000000000430af3 ???????? ()
 0000000000430a58 ???????? ()
-------- thread# 16 / lwp# 16 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2dde070, 2d789a0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2dde070, 2d789a0) + ba
 fffffc7fef16d22e cond_wait (2dde070, 2d789a0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2dde070, 2d789a0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a2240) + 77
 fffffc7fef173730 _lwp_start ()
-------------- thread# 3 / lwp# 3 [umem_update] --------------
 fffffc7fef173777 lwp_park (0, fffffc7fee5fee60, 0)
 fffffc7fef16cb85 cond_wait_queue (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fee60) + 55
 fffffc7fef16cfb5 cond_wait_common (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fee60) + 1b5
 fffffc7fef16d2f9 __cond_timedwait (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fef50) + 69
 fffffc7fef16d3cc cond_timedwait (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fef50) + 3c
 fffffc7fef299c06 umem_update_thread (0) + 1d6
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a0a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 19 / lwp# 19 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2de2f50, 2ddffa0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2de2f50, 2ddffa0) + ba
 fffffc7fef16d22e cond_wait (2de2f50, 2ddffa0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2de2f50, 2ddffa0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a1a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 14 / lwp# 14 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2b9b310, 2ba89a0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2b9b310, 2ba89a0) + ba
 fffffc7fef16d22e cond_wait (2b9b310, 2ba89a0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2b9b310, 2ba89a0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a1240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 17 / lwp# 17 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2ced830, 2dd9970, 0) + 55
 fffffc7fef16d1ea __cond_wait (2ced830, 2dd9970) + ba
 fffffc7fef16d22e cond_wait (2ced830, 2dd9970) + 2e
 fffffc7fef16d275 pthread_cond_wait (2ced830, 2dd9970) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a0240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 13 / lwp# 13 [tokio-runtime-worker] ---------
 fffffc7fef17ab2a write    (2, 11a8c10, 85)
 0000000000a506db ???????? ()
 0000000000814135 ???????? ()
 0000000000807e57 ???????? ()
 00000000004f4433 ???????? ()
 00000000004f23d9 ???????? ()
 00000000004f0919 ???????? ()
 0000000000a9ea9b ???????? ()
 0000000000a9c2c6 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a3240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 20 / lwp# 20 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (16450d0, 28b21c0, 0) + 55
 fffffc7fef16d1ea __cond_wait (16450d0, 28b21c0) + ba
 fffffc7fef16d22e cond_wait (16450d0, 28b21c0) + 2e
 fffffc7fef16d275 pthread_cond_wait (16450d0, 28b21c0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a2a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 15 / lwp# 15 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2de2db0, 2cf4130, 0) + 55
 fffffc7fef16d1ea __cond_wait (2de2db0, 2cf4130) + ba
 fffffc7fef16d22e cond_wait (2de2db0, 2cf4130) + 2e
 fffffc7fef16d275 pthread_cond_wait (2de2db0, 2cf4130) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a4240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 18 / lwp# 18 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (1149670, 114c850, 0) + 55
 fffffc7fef16d1ea __cond_wait (1149670, 114c850) + ba
 fffffc7fef16d22e cond_wait (1149670, 114c850) + 2e
 fffffc7fef16d275 pthread_cond_wait (1149670, 114c850) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a3a40) + 77
 fffffc7fef173730 _lwp_start ()
-------------------- thread# 21 / lwp# 21 --------------------
 0000000000a5e230 ????????(), exit value = 0x0000000000000000
        ** zombie (exited, not detached, not yet joined) **

It would appear that the binary has been ruthlessly stripped:

root@ip-10-150-1-69:~# file /home/build/.cargo/bin/cargo-nextest
/home/build/.cargo/bin/cargo-nextest:   ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, stripped
root@ip-10-150-1-69:~# ls -lh /home/build/.cargo/bin/cargo-nextest
-rwxr-xr-x   1 build    build      10.7M Jan  9 20:35 /home/build/.cargo/bin/cargo-nextest

I assumed we were getting this via cargo --install, but it seems like we actually lift it from somewhere on the Internet:

https://github.com/oxidecomputer/omicron/blob/f6efad4126986f72d3bedfdc04cb4ed30a926f0b/.github/buildomat/build-and-test.sh#L16

Regardless of where we get the binary, it absolutely needs to have symbols, if not all of the debuginfo!

jclulow commented 6 months ago

Upstream issue filed: nextest-rs/nextest#1345