calling convention on builtins is broken

a1k0n commented 12 years ago

If I call memset(a,b,c), it always generates a call to memset with c = 0. If I call memset2 instead, it works. observe:

This is especially annoying because clang does enough inference to figure out when I'm doing a memset, and autogenerates a call to memset in those instances.

$ cat memset1.c
void* memset(void* buf, int c, unsigned siz);

extern int buf[128];
int main()
{
  memset(buf, 1, sizeof(buf));
}

$ bin/clang -target dcpu16 -S memset1.c -O -o -
        ; .file "memset1.c"
        .text
        .globl  main
        ; .align        1
:main
        SET     PUSH, J
        SET     J, SP
        SET     A, SP
        SET     [A], 0x80
        SET     A, buf
        SET     B, 0x1
        SET     C, 0x0 ; <--- ?!  C is supposed to be 0x80.
        JSR     memset
        SET     A, 0x0
        SET     J, POP
        SET     PC, POP

now if I just change memset to memset2:

$ cat memset2.c
void* memset2(void* buf, int c, unsigned siz);

extern int buf[128];
int main()
{
  memset2(buf, 1, sizeof(buf));
}

$ bin/clang -target dcpu16 -S memset2.c -O -o -
        ; .file "memset2.c"
        .text
        .globl  main
        ; .align        1
:main
        SET     PUSH, J
        SET     J, SP
        SUB     SP, 0xfffe
        SET     A, buf
        SET     B, 0x1
        SET     C, 0x80     ; <--- correct now as it's just another call
        JSR     memset2
        SET     A, 0x0
        ADD     SP, 0xfffe
        SET     J, POP
        SET     PC, POP

a1k0n commented 12 years ago

I think this is because memset is defined with a 32-bit length? If I make siz an unsigned long long, I see the same behavior w.r.t. generating a call with C=0 rather than the length, and it doesn't push the 32-bit value to the stack at all. So I think we need to add lowerings for i32, in addition, plus fix intrinsics like memset so they don't have 32-bit sizes.

a1k0n commented 12 years ago

It "pushed" the arg on the stack, but without incrementing SP. So that's not right either -- it'll get clobbered by an interrupt.

ghost commented 12 years ago

Just for reference: How did we fixed this? With the LLVM patch from Blei? (8675f9174f35bb539082709a65b83cf8b1a376b8)

a1k0n commented 12 years ago

Oh, that's annoying. I thought hitting "close" would submit my comment, and then close the issue. Instead github discarded my explanation.

My last comment was wrong. This was always working as intended -- it's just that LLVM forces the size_t length arg to be 32 bits in memset no matter what, and I'm not sure why. That could be its own bug, where we implement the builtins our own way. So the high word of the length goes into C and the low goes onto the stack as the first argument.

The following implementation of memset works correctly:

  .globl memset
:memset
; memset(a, b, c)  -> c is 32 bit, so the actual length is on the stack
  SET C, PICK 1
  SET PUSH, I
  SET PUSH, J
  SET J, A
  SET I, C
  ADD C, A ; c is now the end address
  AND I, 7
  MUL I, -1
  ADD I, 8 ; i is 8-len&7
  ADD PC, I ; jump into table below
  STI [J], B
  STI [J], B
  STI [J], B
  STI [J], B
  STI [J], B
  STI [J], B
  STI [J], B
  STI [J], B
  IFN J, C
    SUB PC, 10 ; PC-10 is the loop above
  SET J, POP
  SET I, POP
  SET PC, POP

llvm-dcpu16 / clang

calling convention on builtins is broken #13