potential code generation issue with clang on macos

While developing in the https://github.com/nanovms/nanos/tree/kernlock branch on macos/clang (commit https://github.com/nanovms/nanos/commit/6ce5efc05f6bd0a96d80a95458a62b7259ae6992), I get the following crash soon after the runloop is started and interrupts are enabled:

qemu-system-x86_64 -m 2G -display none -serial stdio -drive if=none,id=hd0,format=raw,file=/Users/wjhun/src/nanos/kernlock2/output/image/disk.raw -device virtio-scsi-pci,id=scsi\
0 -device scsi-hd,bus=scsi0.0,drive=hd0 -device isa-debug-exit -no-reboot -smp 16 -d int -D int.log -device virtio-net,netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostf\
wd=tcp::9090-:9090,hostfwd=udp::5309-:5309 || exit $(($?>>1))
SMP test: 15 APs online

no fault handler for frame 0000000100a02200 (misc frame)

       cpu: 0000000000000000
 interrupt: 000000000000000e (Page fault)
     frame: 0000000100a02200
error code: 0000000000000000
   address: 0000000fa881644c

  rax: 0000000000000000
  rbx: 000000001999999a
  rcx: 000000007f261d60 (cpuinfos + 0000000000000000/0000000000000480)
  rdx: 0000000fa8816428
  rsi: 0000002af3258c66
  rdi: 0000000000000028
  rbp: 000000010001ff80
  rsp: 000000007701ffc8
   r8: 0000000000000028
   r9: 0000000100c00800
  r10: 0000000000000009
  r11: 0000000000000001
  r12: 0000000100000000
  r13: 0000020000000000
  r14: 000000000000a7c6
  r15: 000000007f061a20 (bootstrap_region + 0000000000000000/0000000000200000)
  rip: 000000007f0002c0
flags: 0000000000000046
   ss: 0000000000000000
   cs: 0000000000000008
   ds: 0000000000000000
   es: 0000000000000000
   fs: 0000000000000000
   gs: 0000000000000000

frame trace:
000000007f03d382        (runloop + 00000000000000e2/0000000000000118)
000000007f03ddbf        (init_service_new_stack + 000000000000055f/000000000000056b)
000000007f03d45b

stack trace:
000000007f03a676        (kernel_sleep + 0000000000000006/0000000000000009)
0000000000000008
0000000000000246
000000010001ff80
0000000000000010
0000000000000000
0000000000000000
0000000000000000
[...]

qemu interrupt logging shows that delivery of an interrupt of vector 0x24 is attempted, but that a page fault trips on the address 0x7f0002c0 - which is 1 byte less than the correct address for vector 0x24:

    15: v=24 e=0000 i=0 cpl=0 IP=0008:000000007f03a676 pc=000000007f03a676 SP=0010:000000010001ff80 env->regs[R_EAX]=0000000000000000
RAX=0000000000000000 RBX=000000001999999a RCX=000000007f261d60 RDX=0000000fa8816428
RSI=0000002af3258c66 RDI=0000000000000028 RBP=000000010001ff80 RSP=000000010001ff80
R8 =0000000000000028 R9 =0000000100c00800 R10=0000000000000009 R11=0000000000000001
R12=0000000100000000 R13=0000020000000000 R14=000000000000a7c6 R15=000000007f061a20
RIP=000000007f03a676 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0008 0000000000000000 00000000 00209a00 DPL=0 CS64 [-R-]
SS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
DS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0010 000000007f261d60 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0028 000000007f0005f0 00000068 00008900 DPL=0 TSS64-avl
GDT=     000000007f0004b0 00000127
IDT=     0000000077208000 000002ff
CR0=80000013 CR2=0000000000000000 CR3=000000007efff000 CR4=00000620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=000000010001ff70 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0xffffffff new 0xe
    16: v=0e e=0000 i=0 cpl=0 IP=0008:000000007f0002c0 pc=000000007f0002c0 SP=0000:000000007701ffc8 CR2=0000000fa881644c
RAX=0000000000000000 RBX=000000001999999a RCX=000000007f261d60 RDX=0000000fa8816428
RSI=0000002af3258c66 RDI=0000000000000028 RBP=000000010001ff80 RSP=000000007701ffc8
R8 =0000000000000028 R9 =0000000100c00800 R10=0000000000000009 R11=0000000000000001
R12=0000000100000000 R13=0000020000000000 R14=000000000000a7c6 R15=000000007f061a20
RIP=000000007f0002c0 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0008 0000000000000000 00000000 00209a00 DPL=0 CS64 [-R-]
SS =0000 0000000000000000 00000000 00000000
DS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0010 000000007f261d60 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0028 000000007f0005f0 00000068 00008900 DPL=0 TSS64-avl
GDT=     000000007f0004b0 00000127
IDT=     0000000077208000 000002ff
CR0=80000013 CR2=0000000fa881644c CR3=000000007efff000 CR4=00000620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=000000010001ff70 CCO=EFLAGS
EFER=0000000000000d00

Checking out the computed content of the IDT table, we find that some - not all - of the vectors are computed to be 1 less than the correct value:

$ diff working/idts crashes/idts
33c33
< 20: 0x7f008e01000802a5  0x0
---
> 20: 0x7f008e01000802a4  0x0
35c35
< 22: 0x7f008e01000802b3  0x0
---
> 22: 0x7f008e01000802b2  0x0
37c37
< 24: 0x7f008e01000802c1  0x0
---
> 24: 0x7f008e01000802c0  0x0
39c39
< 26: 0x7f008e01000802cf  0x0
---
> 26: 0x7f008e01000802ce  0x0
41c41
< 28: 0x7f008e01000802dd  0x0
---
> 28: 0x7f008e01000802dc  0x0
43c43
< 2a: 0x7f008e01000802eb  0x0
---
> 2a: 0x7f008e01000802ea  0x0
45c45
< 2c: 0x7f008e01000802f9  0x0
---
> 2c: 0x7f008e01000802f8  0x0
47c47
< 2e: 0x7f008e0100080307  0x0
---
> 2e: 0x7f008e0100080306  0x0

Note that the IDT for vector 0x24 is computed to be 0x7f008e01000802c0 instead of the correct value of 0x7f008e01000802c1. So let's look at the disassembly of start_interrupts, specifically the portion that builds the table. Note that write_idt is being inlined, which shouldn't be an issue. We see some unusual opcodes (emphasized with ***) that binutils objdump interprets as nops with operands:

    /* IDT setup */
    idt = allocate(pages, pages->pagesize);
    7f03accb:   49 8b 77 40             mov    rsi,QWORD PTR [r15+0x40]
    7f03accf:   4c 89 ff                mov    rdi,r15
    7f03acd2:   41 ff 57 28             call   QWORD PTR [r15+0x28]
    7f03acd6:   48 89 05 4b 6b 02 00    mov    QWORD PTR [rip+0x26b4b],rax        # 7f061828 <idt>
    7f03acdd:   31 c9                   xor    ecx,ecx
    7f03acdf:   44 8b 2d db 54 fc ff    mov    r13d,DWORD PTR [rip+0xfffffffffffc54db]        # 7f0001c1 <interrupt_vector_size>
    7f03ace6:   49 b8 00 00 08 00 00    movabs r8,0x8e0000080000
    7f03aced:   8e 00 00
    7f03acf0:   31 ff                   xor    edi,edi
*** 7f03acf2:   66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
*** 7f03acf9:   00 00 00
*** 7f03acfc:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]

    u64 vector_base = u64_from_pointer(&interrupt_vectors);
    for (int i = 0; i < INTERRUPT_VECTOR_START; i++)
        write_idt(i, vector_base + i * interrupt_vector_size, i == 0xe ? IST_PAGEFAULT : 0);
    7f03ad00:   89 cb                   mov    ebx,ecx
    7f03ad02:   48 8d 93 c5 01 00 7f    lea    rdx,[rbx+0x7f0001c5]
    7f03ad09:   31 f6                   xor    esi,esi
    7f03ad0b:   48 81 ff e0 00 00 00    cmp    rdi,0xe0
    7f03ad12:   40 0f 94 c6             sete   sil
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ad16:   44 0f b7 ca             movzx  r9d,dx
                 (((offset >> 16) & MASK(16)) << 48) | (type_attr << 40) | (ist << 32)); /* 63 - 32 */
    7f03ad1a:   48 89 d3                mov    rbx,rdx
    7f03ad1d:   48 81 e3 00 00 ff ff    and    rbx,0xffffffffffff0000
    7f03ad24:   48 c1 e3 20             shl    rbx,0x20
    7f03ad28:   48 c1 e6 21             shl    rsi,0x21
    7f03ad2c:   4c 09 ce                or     rsi,r9
    7f03ad2f:   48 09 de                or     rsi,rbx
    7f03ad32:   4c 09 c6                or     rsi,r8
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ad35:   48 89 34 38             mov    QWORD PTR [rax+rdi*1],rsi
    target[1] = offset >> 32;   /*  95 - 64 */
    7f03ad39:   48 c1 ea 20             shr    rdx,0x20
    7f03ad3d:   48 89 54 38 08          mov    QWORD PTR [rax+rdi*1+0x8],rdx
    for (int i = 0; i < INTERRUPT_VECTOR_START; i++)
    7f03ad42:   48 83 c7 10             add    rdi,0x10
    7f03ad46:   44 01 e9                add    ecx,r13d
    7f03ad49:   48 81 ff 00 02 00 00    cmp    rdi,0x200
    7f03ad50:   75 ae                   jne    7f03ad00 <start_interrupts+0xe0>

    for (int i = INTERRUPT_VECTOR_START; i < n_interrupt_vectors; i++)
    7f03ad52:   44 8b 15 64 54 fc ff    mov    r10d,DWORD PTR [rip+0xfffffffffffc5464]        # 7f0001bd <n_interrupt_vectors>
    7f03ad59:   49 63 fa                movsxd rdi,r10d
    7f03ad5c:   49 83 fa 21             cmp    r10,0x21
    7f03ad60:   0f 82 0d 01 00 00       jb     7f03ae73 <start_interrupts+0x253>
    7f03ad66:   48 89 7d c0             mov    QWORD PTR [rbp-0x40],rdi
    7f03ad6a:   4c 89 75 b8             mov    QWORD PTR [rbp-0x48],r14
    7f03ad6e:   49 bb 00 00 08 00 01    movabs r11,0x8e0100080000
    7f03ad75:   8e 00 00
    7f03ad78:   44 89 d1                mov    ecx,r10d
    7f03ad7b:   83 e1 01                and    ecx,0x1
    7f03ad7e:   48 89 4d d0             mov    QWORD PTR [rbp-0x30],rcx
    7f03ad82:   41 b9 20 00 00 00       mov    r9d,0x20
    7f03ad88:   41 83 fa 21             cmp    r10d,0x21
    7f03ad8c:   0f 84 a0 00 00 00       je     7f03ae32 <start_interrupts+0x212>
    7f03ad92:   44 89 e9                mov    ecx,r13d
    7f03ad95:   c1 e1 05                shl    ecx,0x5
    7f03ad98:   42 8d 14 29             lea    edx,[rcx+r13*1]
    7f03ad9c:   47 8d 7c 2d 00          lea    r15d,[r13+r13*1+0x0]
    7f03ada1:   48 89 c7                mov    rdi,rax
    7f03ada4:   48 81 c7 10 02 00 00    add    rdi,0x210
    7f03adab:   4d 89 d4                mov    r12,r10
    7f03adae:   4c 2b 65 d0             sub    r12,QWORD PTR [rbp-0x30]
    7f03adb2:   41 b9 20 00 00 00       mov    r9d,0x20
    7f03adb8:   0f 1f 84 00 00 00 00    nop    DWORD PTR [rax+rax*1+0x0]
    7f03adbf:   00
        write_idt(i, vector_base + i * interrupt_vector_size, IST_INTERRUPT);
    7f03adc0:   89 ce                   mov    esi,ecx
    7f03adc2:   4c 8d b6 c5 01 00 7f    lea    r14,[rsi+0x7f0001c5]
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03adc9:   45 89 f0                mov    r8d,r14d
    7f03adcc:   41 81 e0 fe ff 00 00    and    r8d,0xfffe
                 (((offset >> 16) & MASK(16)) << 48) | (type_attr << 40) | (ist << 32)); /* 63 - 32 */
    7f03add3:   4c 89 f6                mov    rsi,r14
    7f03add6:   48 81 e6 00 00 ff ff    and    rsi,0xffffffffffff0000
    7f03addd:   48 c1 e6 20             shl    rsi,0x20
    7f03ade1:   4c 09 c6                or     rsi,r8
    7f03ade4:   4c 09 de                or     rsi,r11
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ade7:   48 89 77 f0             mov    QWORD PTR [rdi-0x10],rsi
    target[1] = offset >> 32;   /*  95 - 64 */
    7f03adeb:   49 c1 ee 20             shr    r14,0x20
    7f03adef:   4c 89 77 f8             mov    QWORD PTR [rdi-0x8],r14
        write_idt(i, vector_base + i * interrupt_vector_size, IST_INTERRUPT);
    7f03adf3:   89 d6                   mov    esi,edx
    7f03adf5:   48 8d 9e c5 01 00 7f    lea    rbx,[rsi+0x7f0001c5]
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03adfc:   44 0f b7 c3             movzx  r8d,bx
                 (((offset >> 16) & MASK(16)) << 48) | (type_attr << 40) | (ist << 32)); /* 63 - 32 */
    7f03ae00:   48 89 de                mov    rsi,rbx
    7f03ae03:   48 81 e6 00 00 ff ff    and    rsi,0xffffffffffff0000
    7f03ae0a:   48 c1 e6 20             shl    rsi,0x20
    7f03ae0e:   4c 09 c6                or     rsi,r8
    7f03ae11:   4c 09 de                or     rsi,r11
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ae14:   48 89 37                mov    QWORD PTR [rdi],rsi
    target[1] = offset >> 32;   /*  95 - 64 */
    7f03ae17:   48 c1 eb 20             shr    rbx,0x20
    7f03ae1b:   48 89 5f 08             mov    QWORD PTR [rdi+0x8],rbx
    for (int i = INTERRUPT_VECTOR_START; i < n_interrupt_vectors; i++)
    7f03ae1f:   49 83 c1 02             add    r9,0x2
    7f03ae23:   44 01 fa                add    edx,r15d
    7f03ae26:   48 83 c7 20             add    rdi,0x20
    7f03ae2a:   44 01 f9                add    ecx,r15d
    7f03ae2d:   4d 39 cc                cmp    r12,r9
    7f03ae30:   75 8e                   jne    7f03adc0 <start_interrupts+0x1a0>
    7f03ae32:   83 7d d0 00             cmp    DWORD PTR [rbp-0x30],0x0
    7f03ae36:   4c 8b 75 b8             mov    r14,QWORD PTR [rbp-0x48]
    7f03ae3a:   48 8b 7d c0             mov    rdi,QWORD PTR [rbp-0x40]
    7f03ae3e:   74 33                   je     7f03ae73 <start_interrupts+0x253>
        write_idt(i, vector_base + i * interrupt_vector_size, IST_INTERRUPT);
    7f03ae40:   45 0f af e9             imul   r13d,r9d
    7f03ae44:   49 8d 8d c5 01 00 7f    lea    rcx,[r13+0x7f0001c5]
    return pointer_from_u64((u64_from_pointer(idt) + 2 * sizeof(u64) * interrupt));
    7f03ae4b:   49 c1 e1 04             shl    r9,0x4
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ae4f:   0f b7 d1                movzx  edx,cx
                 (((offset >> 16) & MASK(16)) << 48) | (type_attr << 40) | (ist << 32)); /* 63 - 32 */
    7f03ae52:   48 89 ce                mov    rsi,rcx
    7f03ae55:   48 81 e6 00 00 ff ff    and    rsi,0xffffffffffff0000
    7f03ae5c:   48 c1 e6 20             shl    rsi,0x20
    7f03ae60:   48 09 d6                or     rsi,rdx
    7f03ae63:   4c 09 de                or     rsi,r11
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ae66:   4a 89 34 08             mov    QWORD PTR [rax+r9*1],rsi
    target[1] = offset >> 32;   /*  95 - 64 */
    7f03ae6a:   48 c1 e9 20             shr    rcx,0x20
    7f03ae6e:   4a 89 4c 08 08          mov    QWORD PTR [rax+r9*1+0x8],rcx
    return pointer_from_u64((u64_from_pointer(idt) + 2 * sizeof(u64) * interrupt));
    7f03ae73:   48 c1 e7 04             shl    rdi,0x4

    void *idt_desc = idt_from_interrupt(n_interrupt_vectors); /* placed after last entry */
    *(u16*)idt_desc = 2 * sizeof(u64) * n_interrupt_vectors - 1;
    7f03ae77:   41 c1 e2 04             shl    r10d,0x4
    7f03ae7b:   41 83 c2 ff             add    r10d,0xffffffff
    7f03ae7f:   66 44 89 14 38          mov    WORD PTR [rax+rdi*1],r10w
    *(u64*)(idt_desc + sizeof(u16)) = u64_from_pointer(idt);
    7f03ae84:   48 89 44 38 02          mov    QWORD PTR [rax+rdi*1+0x2],rax
    asm("lidt %0": : "m"(*(u64*)idt_desc));
    7f03ae89:   0f 01 1c 38             lidt   [rax+rdi*1]
    return allocate_u64(interrupt_vector_heap, 1);
    7f03ae8d:   48 8b 3d 8c 69 02 00    mov    rdi,QWORD PTR [rip+0x2698c]        # 7f061820 <interrupt_vector_heap>
    7f03ae94:   be 01 00 00 00          mov    esi,0x1
    7f03ae99:   ff 57 28                call   QWORD PTR [rdi+0x28]
[...]

Given that SSE instructions are involved, I subsequently added assertions that the stack was 16-byte aligned on function entry and just before table building - and it was.

Granted, this rountine runs without crashing, albeit producing incorrect values. Adding __attribute__((noinline)) to write_idt() produces correct entries and resolves the crash. Neither start_interrupts() nor the stand-alone write_idt() contain the suspect opcodes:

    /* IDT setup */
    idt = allocate(pages, pages->pagesize);
    7f03acc1:   49 8b 77 40             mov    rsi,QWORD PTR [r15+0x40]
    7f03acc5:   4c 89 ff                mov    rdi,r15
    7f03acc8:   41 ff 57 28             call   QWORD PTR [r15+0x28]
    7f03accc:   48 89 05 55 6b 02 00    mov    QWORD PTR [rip+0x26b55],rax        # 7f061828 <idt>
    7f03acd3:   31 db                   xor    ebx,ebx
    7f03acd5:   66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
    7f03acdc:   00 00 00
    7f03acdf:   90                      nop
    7f03ace0:   8b 05 db 54 fc ff       mov    eax,DWORD PTR [rip+0xfffffffffffc54db]        # 7f0001c1 <interrupt_vector_size>

    u64 vector_base = u64_from_pointer(&interrupt_vectors);
    for (int i = 0; i < INTERRUPT_VECTOR_START; i++)
        write_idt(i, vector_base + i * interrupt_vector_size, i == 0xe ? IST_PAGEFAULT : 0);
    7f03ace6:   0f af c3                imul   eax,ebx
    7f03ace9:   48 8d b0 c5 01 00 7f    lea    rsi,[rax+0x7f0001c5]
    7f03acf0:   31 d2                   xor    edx,edx
    7f03acf2:   83 fb 0e                cmp    ebx,0xe
    7f03acf5:   0f 94 c2                sete   dl
    7f03acf8:   48 01 d2                add    rdx,rdx
    7f03acfb:   89 df                   mov    edi,ebx
    7f03acfd:   e8 0e 01 00 00          call   7f03ae10 <write_idt>
    for (int i = 0; i < INTERRUPT_VECTOR_START; i++)
    7f03ad02:   83 c3 01                add    ebx,0x1
    7f03ad05:   83 fb 20                cmp    ebx,0x20
    7f03ad08:   75 d6                   jne    7f03ace0 <start_interrupts+0xc0>

    for (int i = INTERRUPT_VECTOR_START; i < n_interrupt_vectors; i++)
    7f03ad0a:   8b 05 ad 54 fc ff       mov    eax,DWORD PTR [rip+0xfffffffffffc54ad]        # 7f0001bd <n_interrupt_vectors>
    7f03ad10:   83 f8 21                cmp    eax,0x21
    7f03ad13:   72 34                   jb     7f03ad49 <start_interrupts+0x129>
    7f03ad15:   bb 20 00 00 00          mov    ebx,0x20
    7f03ad1a:   66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]
    7f03ad20:   8b 05 9b 54 fc ff       mov    eax,DWORD PTR [rip+0xfffffffffffc549b]        # 7f0001c1 <interrupt_vector_size>
        write_idt(i, vector_base + i * interrupt_vector_size, IST_INTERRUPT);
    7f03ad26:   0f af c3                imul   eax,ebx
    7f03ad29:   48 8d b0 c5 01 00 7f    lea    rsi,[rax+0x7f0001c5]
    7f03ad30:   ba 01 00 00 00          mov    edx,0x1
    7f03ad35:   89 df                   mov    edi,ebx
    7f03ad37:   e8 d4 00 00 00          call   7f03ae10 <write_idt>
    for (int i = INTERRUPT_VECTOR_START; i < n_interrupt_vectors; i++)
    7f03ad3c:   83 c3 01                add    ebx,0x1
    7f03ad3f:   8b 05 78 54 fc ff       mov    eax,DWORD PTR [rip+0xfffffffffffc5478]        # 7f0001bd <n_interrupt_vectors>
    7f03ad45:   39 c3                   cmp    ebx,eax
    7f03ad47:   72 d7                   jb     7f03ad20 <start_interrupts+0x100>
    return pointer_from_u64((u64_from_pointer(idt) + 2 * sizeof(u64) * interrupt));
    7f03ad49:   48 8b 0d d8 6a 02 00    mov    rcx,QWORD PTR [rip+0x26ad8]        # 7f061828 <idt>
    7f03ad50:   48 98                   cdqe
    7f03ad52:   48 89 c2                mov    rdx,rax
    7f03ad55:   48 c1 e2 04             shl    rdx,0x4

    void *idt_desc = idt_from_interrupt(n_interrupt_vectors); /* placed after last entry */
    *(u16*)idt_desc = 2 * sizeof(u64) * n_interrupt_vectors - 1;
    7f03ad59:   c1 e0 04                shl    eax,0x4
    7f03ad5c:   83 c0 ff                add    eax,0xffffffff
    7f03ad5f:   66 89 04 11             mov    WORD PTR [rcx+rdx*1],ax
    *(u64*)(idt_desc + sizeof(u16)) = u64_from_pointer(idt);
    7f03ad63:   48 89 4c 11 02          mov    QWORD PTR [rcx+rdx*1+0x2],rcx
    asm("lidt %0": : "m"(*(u64*)idt_desc));
    7f03ad68:   0f 01 1c 11             lidt   [rcx+rdx*1]
    return allocate_u64(interrupt_vector_heap, 1);
    7f03ad6c:   48 8b 3d ad 6a 02 00    mov    rdi,QWORD PTR [rip+0x26aad]        # 7f061820 <interrupt_vector_heap>
    7f03ad73:   be 01 00 00 00          mov    esi,0x1
    7f03ad78:   ff 57 28                call   QWORD PTR [rdi+0x28]
[...]

and

000000007f03ae10 <write_idt>:
{
    7f03ae10:   55                      push   rbp
    7f03ae11:   48 89 e5                mov    rbp,rsp
    return pointer_from_u64((u64_from_pointer(idt) + 2 * sizeof(u64) * interrupt));
    7f03ae14:   4c 8b 05 0d 6a 02 00    mov    r8,QWORD PTR [rip+0x26a0d]        # 7f061828 <idt>
    7f03ae1b:   48 63 cf                movsxd rcx,edi
    7f03ae1e:   48 c1 e1 04             shl    rcx,0x4
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ae22:   0f b7 fe                movzx  edi,si
                 (((offset >> 16) & MASK(16)) << 48) | (type_attr << 40) | (ist << 32)); /* 63 - 32 */
    7f03ae25:   48 89 f0                mov    rax,rsi
    7f03ae28:   48 25 00 00 ff ff       and    rax,0xffffffffffff0000
    7f03ae2e:   48 c1 e0 20             shl    rax,0x20
    7f03ae32:   48 09 f8                or     rax,rdi
    7f03ae35:   48 c1 e2 20             shl    rdx,0x20
    7f03ae39:   48 bf 00 00 08 00 00    movabs rdi,0x8e0000080000
    7f03ae40:   8e 00 00
    7f03ae43:   48 09 d7                or     rdi,rdx
    7f03ae46:   48 09 c7                or     rdi,rax
    target[0] = ((selector << 16) | (offset & MASK(16)) | /* 31 - 0 */
    7f03ae49:   49 89 3c 08             mov    QWORD PTR [r8+rcx*1],rdi
    target[1] = offset >> 32;   /*  95 - 64 */
    7f03ae4d:   48 c1 ee 20             shr    rsi,0x20
    7f03ae51:   49 89 74 08 08          mov    QWORD PTR [r8+rcx*1+0x8],rsi
}
    7f03ae56:   5d                      pop    rbp
    7f03ae57:   c3                      ret
    7f03ae58:   0f 1f 84 00 00 00 00    nop    DWORD PTR [rax+rax*1+0x0]
    7f03ae5f:   00

I added a volatile to the lidt asm inline, placed copious memory barriers, etc., but to no avail. The crash occurs the same way whether with hvf acceleration or run-noaccel (TCG).

I can't see the purpose of those "nop"s. Though farfetched, I thought perhaps they were multi-byte nops that enclosed a table of some kind, yet I don't see any reference to those locations in the code.

I haven't isolated or hypothesized any cause, and I still need to single-step through the inlined table build. Opening an issue now to keep track of progress.

Built with:

Apple clang version 11.0.0 (clang-1100.0.33.16)
Target: x86_64-apple-darwin19.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Xcode 11.3
Build version 11C29

nanovms / ops

potential code generation issue with clang on macos #445