Open swift-ci opened 7 years ago
Is that from a debug build of the Swift stdlib? The debug build enables assertions that might catch something before that crash.
Can you attach the output of these gdb commands at the crash? I might be able to figure out exactly what data was wrong.
{{ disassemble '\<swift::RefCounts\<swift::RefCountBitsT\<(swift::RefCountInlinedness)1> >::allocateSideTable()'}}
{{ info registers}}
Having said that, it's likely that the failure here is a side effect of one of the bugs that the other tests caught. The failures in ParameterPassing are scary, for one.
Comment by Umberto Raimondi (JIRA)
Ok, I'll try building it with debug enabled, in the meanwhile:
(gdb) disassemble 'swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1> >::allocateSideTable()'
Dump of assembler code for function _ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv:
0x76edfd50 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x76edfd54 <+4>: add r11, sp, #​28
0x76edfd58 <+8>: sub sp, sp, #​28
0x76edfd5c <+12>: mov r5, r0
0x76edfd60 <+16>: ldr r7, [r5]
0x76edfd64 <+20>: cmp r7, #​0
0x76edfd68 <+24>: blt 0x76edfe4c <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+252>
0x76edfd6c <+28>: mov r10, #​0
0x76edfd70 <+32>: tst r7, #​256 ; 0x100
0x76edfd74 <+36>: bne 0x76edfe58 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+264>
0x76edfd78 <+40>: mov r0, #​48 ; 0x30
0x76edfd7c <+44>: bl 0x76af8c80 <_Znwj@plt>
0x76edfd80 <+48>: mov r10, r0
0x76edfd84 <+52>: add r8, sp, #​8
0x76edfd88 <+56>: vmov.i32 q8, #​0 ; 0x00000000
0x76edfd8c <+60>: mov r1, r10
0x76edfd90 <+64>: sub r0, r5, #​4
0x76edfd94 <+68>: str r0, [r1], #​32
0x76edfd98 <+72>: add r6, r10, #​16
0x76edfd9c <+76>: add r4, r8, #​8
0x76edfda0 <+80>: mov r0, #-1073741824 ; 0xc0000000
0x76edfda4 <+84>: vst1.64 {d16-d17}, [r1 :128]
=> 0x76edfda8 <+88>: orr r0, r0, r10, lsr #​2
0x76edfdac <+92>: vst1.64 {d16-d17}, [r6 :128]
0x76edfdb0 <+96>: str r0, [sp, #​4]
0x76edfdb4 <+100>: tst r7, #​256 ; 0x100
0x76edfdb8 <+104>: bne 0x76edfe54 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+260>
0x76edfdbc <+108>: uxtb r0, r7
0x76edfdc0 <+112>: and r1, r7, #-2147483648 ; 0x80000000
0x76edfdc4 <+116>: str r0, [sp, #​8]
0x76edfdc8 <+120>: movw r0, #​65534 ; 0xfffe
0x76edfdcc <+124>: movt r0, #​127 ; 0x7f
0x76edfdd0 <+128>: ubfx r2, r7, #​8, #​1
0x76edfdd4 <+132>: orr r1, r2, r1
0x76edfdd8 <+136>: and r0, r0, r7, lsr #​8
0x76edfddc <+140>: orr r0, r1, r0
0x76edfde0 <+144>: str r0, [sp, #​12]
0x76edfde4 <+148>: mov r1, #​1
0x76edfde8 <+152>: mov r0, #​0
0x76edfdec <+156>: str r1, [r4]
0x76edfdf0 <+160>: mov r1, r6
0x76edfdf4 <+164>: str r0, [r4, #​4]
0x76edfdf8 <+168>: mov r0, #​16
0x76edfdfc <+172>: mov r2, r8
0x76edfe00 <+176>: mov r3, #​0
0x76edfe04 <+180>: bl 0x76af8d34 <__atomic_store@plt>
0x76edfe08 <+184>: ldrex r9, [r5]
0x76edfe0c <+188>: cmp r9, r7
0x76edfe10 <+192>: bne 0x76edfe2c <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+220>
0x76edfe14 <+196>: dmb ish
0x76edfe18 <+200>: ldr r1, [sp, #​4]
0x76edfe1c <+204>: strex r0, r1, [r5]
0x76edfe20 <+208>: cmp r0, #​0
0x76edfe24 <+212>: bne 0x76edfe30 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+224>
0x76edfe28 <+216>: b 0x76edfe58 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+264>
0x76edfe2c <+220>: clrex
0x76edfe30 <+224>: cmp r9, #​0
0x76edfe34 <+228>: mov r7, r9
0x76edfe38 <+232>: bge 0x76edfdb4 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+100>
---Type <return> to continue, or q <return> to quit---
0x76edfe3c <+236>: mov r0, r10
0x76edfe40 <+240>: bl 0x76af8c14 <_ZdlPv@plt>
0x76edfe44 <+244>: lsl r10, r9, #​2
0x76edfe48 <+248>: b 0x76edfe58 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+264>
0x76edfe4c <+252>: lsl r10, r7, #​2
0x76edfe50 <+256>: b 0x76edfe58 <_ZN5swift9RefCountsINS_13RefCountBitsTILNS_19RefCountInlinednessE1EEEE17allocateSideTableEv+264>
0x76edfe54 <+260>: mov r10, #​0
0x76edfe58 <+264>: mov r0, r10
0x76edfe5c <+268>: sub sp, r11, #​28
0x76edfe60 <+272>: pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
End of assembler dump.
(gdb)
(gdb)
(gdb) info registers
r0 0xc0000000 3221225472
r1 0x741016b8 1947211448
r2 0x40000000 1073741824
r3 0x74a478f8 1956935928
r4 0x74a46810 1956931600
r5 0x7410160c 1947211276
r6 0x741016a8 1947211432
r7 0x202 514
r8 0x74a46808 1956931592
r9 0x74a46980 1956931968
r10 0x74101698 1947211416
r11 0x74a46838 1956931640
r12 0x768dc930 1989003568
sp 0x74a46800 0x74a46800
lr 0x7669845f 1986626655
pc 0x76edfda8 0x76edfda8 <swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1> >::allocateSideTable()+88>
cpsr 0x600f0010 1611595792
(gdb)
The orr instruction at +88 can't crash. It's likely that the crashing instruction is the previous instruction at +84.
The vst1.64 instruction at +84 can crash because it is storing to memory. I think that instruction is trying to store 128 bits to a 128-bit aligned address, but the address (in r1) is only 64-bit aligned. That misalignment might be fatal.
That vst1.64 instruction looks wrong. I think the vst's at +84 and +92 are zero-filling the first 32 bytes of the newly-allocated HeapObjectSideTableEntry, but that can't be right because the store at +68 already set HeapObjectSideTableEntry->object and we still need that value. I suspect that either I'm mis-reading the assembly code or else the compiler is generating bad code for HeapObjectSideTableEntry or SideTableRefCounts.
Comment by Umberto Raimondi (JIRA)
It could really be a matter of alignment, since running the same process through strace I get:
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0x16ddfe8} ---
+++ killed by SIGBUS +++
Bus error
After multiple attempts, I'm still trying to build swift with the stdlib assertions but the binaries are just too big and gold can't allocate the memory it needs (regardless of the swap, and common linux/gold workarounds didn't help), I'll try to strip a bit of debug info to reduce the binary size.
Try the ReleaseAssert configuration, maybe? It might be smaller.
Comment by Paul Nettle (JIRA)
Has there been any new information on this? I'm seeing this issue as well (SIGBUS error, likely a memory alignment issue.) Specifically, it looked to me like a race condition related to the [weak self]
reference in Basic/Thread.swift (link), which is how I landed here.
I was originally just trying to work around the issue for my local build (submitting a PR if that led to the discovery of the underlying problem) but it appears as though the problem could be more systemic.
I'm going to continue to poke at it when I can but if anybody has any new ideas, please do share.
Comment by Marco Chini (JIRA)
Environment
Linux (Ubuntu Mate 16.04LTS) on RaspberryPi2 (armv7).Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 11 | |Component/s | Compiler | |Labels | Bug, RunTimeCrash, Runtime, Swift4, arm, armv7 | |Assignee | None | |Priority | Medium | md5: fda8e0d9115a4de24351b7ca14e2c26aIssue Description:
I've noticed this issue while trying to build SPM with swift from the swift-4.0-branch branch on a RaspberryPi2 (swift-build-stage1 can't build the self hosted swift-build, crashing in the same way as weak-reference-racetests does). This is just one of the 16 failing tests listed in SR-5845.
This is the output of gdb when I re-run the test manually:
The test crashes here.
Since I've seen a few comments in the code hinting at the fact that further modifications(or improvements) to the layout of HeapObject could be needed on 32-bit platform, could this issue be related to that?
The offset used in getHeapObject() makes sense, so I guess the real culprit is somewhere else.
@gparker42 what do you suggest to look into to try to debug this further?