justincormack / ljsyscall

LuaJIT Unix syscall FFI
http://www.myriabit.com/ljsyscall/
Other
440 stars 53 forks source link

Misoptimization on lseek system call #223

Closed wingo closed 6 years ago

wingo commented 6 years ago

I have a weird bug :)

Consider the following test script:

local S = require 'syscall'
local ffi = require('ffi')

local buf = ffi.new('uint8_t[1024]')

local wr = assert(S.open('/tmp/test.txt', 'wronly,creat,trunc', 'rusr,wusr'))
assert(wr:write(buf, 1024) == 1024)
wr:close()

local rd = assert(S.open('/tmp/test.txt', 'rdonly'))

while true do
   local did_read, err = rd:read(buf, 1024)
   if did_read then
      assert(did_read > 0)
      print('read '..did_read..'; now seeking by '..(-did_read))
      local pos, err = rd:lseek(-did_read, 'cur')
      if pos == nil then
         print('lseek failed: '..tostring(err))
         print('current position: '..rd:lseek(0, 'cur'))
         os.exit(1)
      end
   else
      print(tostring(err))
   end
end

It creates a 1024-byte file, then repeatedly reads up to 1024 bytes, then seeks backwards by the number of bytes read.

When run as part of Snabb, I get:

$ ./snabb snsh ../test.lua 
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
lseek failed: Invalid argument
current position: 1024

This is of course really weird. If I run with -jv I see that a trace was emitted right before the failure:

$ ./snabb snsh -jv ../test.lua 
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
[TRACE   4 helpers.lua:64 return]
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
[TRACE --- c.lua:27 -- leaving loop in root trace at c.lua:514]
read 1024; now seeking by -1024
read 1024; now seeking by -1024
[TRACE --- c.lua:174 -- leaving loop in root trace at syscalls.lua:125]
[TRACE --- syscalls.lua:42 -- leaving loop in root trace at test.lua:19]
read 1024; now seeking by -1024
read 1024; now seeking by -1024
read 1024; now seeking by -1024
[TRACE   5 test.lua:13 stitch print]
read 1024; now seeking by -1024
read 1024; now seeking by -1024
[TRACE   6 (5/stitch) test.lua:18 -> 5]
read 1024; now seeking by -1024
lseek failed: Invalid argument
current position: 1024

Therefore I am almost certain that there is a bug in LuaJIT here. However -- looking at ljsyscall, I see this for Lua on 64-bit Linux:

  function C.lseek(fd, offset, whence) return syscall_off(sys.lseek, int(fd), ulong(offset), int(whence)) end

Note that it's ulong, whereas offset is signed. If I modify ljsyscall to replace that ulong with long, I get no error (!!!!!). I can't see how this would make a difference as far as the various casting rules, but there you go.

So my question for ljsyscall is, should we change this ulong for long?

Also, what's the deal with the underlying bug?

wingo commented 6 years ago

For what it's worth, if I take out the print from the loop, then the loop trace's loop body looks like this:

0105 ------ LOOP ------------
0106    int XLOAD  0020  
0110    i64 CALLXS [0x7f69544d7120](0045 0106 0041 +1024) ctype +950
0111    num CONV   0110  num.i64
0112 >  num NE     0111  -1  
0113 >  num GT     0111  +0  
0114    num NEG    0111  0062
0115    int XLOAD  0020  
0116 }  cdt CNEWI  +9    0115
0117    u64 CONV   0114  u64.num
0118 }  cdt CNEWI  +12   0117
0122    i64 CALLXS [0x7f69544d7120](0096 0115 0117 0094) ctype +950
0123 }  cdt CNEWI  +11   0122
0124 }  cdt CNEWI  +12   0122
0125 >  u64 NE     0122  -1  
---- TRACE 5 mcode 1088

The corresponding machine code looks like this:

->LOOP:
2a4ff660  mov edi, [rsp+0x20]
2a4ff664  mov rdx, [rsp+0x18]
2a4ff669  mov esi, [rbp+0x8]
2a4ff66c  mov ecx, 0x400
2a4ff671  mov al, 0x0
2a4ff673  call rbx
2a4ff675  mov edi, [rsp+0x14]
2a4ff679  mov ecx, [rsp+0x10]
2a4ff67d  xorps xmm7, xmm7
2a4ff680  cvtsi2sd xmm7, rax
2a4ff685  movsd [rsp+0x8], xmm7
2a4ff68b  ucomisd xmm7, [0x41b2d508]
2a4ff694  jpe 0x2a4ff69c
2a4ff696  jz 0x2a4f0030 ->8
2a4ff69c  ucomisd xmm7, [0x41b2d4d8]
2a4ff6a5  jbe 0x2a4f0034    ->9
2a4ff6ab  xorps xmm7, [0x41619660]
2a4ff6b3  mov esi, [rbp+0x8]
2a4ff6b6  cvttsd2si rdx, xmm7
2a4ff6bb  test rdx, rdx
2a4ff6be  jns 0x2a4ff6ce
2a4ff6c0  addsd xmm7, [0x41619680]
2a4ff6c9  cvttsd2si rdx, xmm7
2a4ff6ce  mov al, 0x0
2a4ff6d0  call rbx
2a4ff6d2  cmp rax, -0x01
2a4ff6d6  jnz 0x2a4ff660    ->LOOP
2a4ff6d8  jmp 0x2a4f0040    ->12
---- TRACE 5 stop -> loop

The loop fails immediately after this optimized trace is installed.

wingo commented 6 years ago

Working through the assembly...

First we load the arguments to syscall into registers

->LOOP:
2a4ff660  mov edi, [rsp+0x20] ;; the syscall number for "read"
2a4ff664  mov rdx, [rsp+0x18] ;; the buffer
2a4ff669  mov esi, [rbp+0x8] ;; the fd
2a4ff66c  mov ecx, 0x400 ;; the count
2a4ff671  mov al, 0x0 ;; why is this zeroed? not sure

Now we call syscall. Parameters in RDI, RSI, RDX, and RCX; return value to RAX

2a4ff673  call rbx ;; syscall

OK the call is done. Now we start loading constants for arguments to the next syscall, lseek(fd, offset, whence):

2a4ff675  mov edi, [rsp+0x14] ;; syscall number for "lseek"
2a4ff679  mov ecx, [rsp+0x10] ;; value for SEEK_CUR

Now we start dealing with the return value from read. First, we load the return from read into a floating-point register:

2a4ff67d  xorps xmm7, xmm7 ;; zero whole register to prevent false
                           ;; dependencies on value of high quadword
2a4ff680  cvtsi2sd xmm7, rax ;; convert i64 to double in xmm7
2a4ff685  movsd [rsp+0x8], xmm7 ;; save that double on the stack

OK. Now we have two checks to do on the return value. Firstly we have to check if it's -1, in which case we bail to a side trace for the case in which read() failed:

2a4ff68b  ucomisd xmm7, [0x41b2d508] ;; compare to stored value for -1
2a4ff694  jpe 0x2a4ff69c ;; if xmm7 is NaN, it's not -1
2a4ff696  jz 0x2a4f0030 ->8

OK, we're on the success path. Now compare against 0, for the assert(did_read > 0), and if that fails, bail out:

2a4ff69c  ucomisd xmm7, [0x41b2d4d8]
2a4ff6a5  jbe 0x2a4f0034    ->9

I think the reason this bails to a trace instead of the interpreter is due to sunk allocations. Not really sure though.

Finally, we prepare the rd:lseek(-did_read, 'cur') call. Note that in this loop I took out the print.

2a4ff6ab  xorps xmm7, [0x41619660] ;; This is how LuaJIT negates a double.
2a4ff6b3  mov esi, [rbp+0x8] ;; Load the fd.
2a4ff6b6  cvttsd2si rdx, xmm7 ;; Convert double offset to int64 for 3rd arg to syscall.
2a4ff6bb  test rdx, rdx ;; Check if offset has sign bit
2a4ff6be  jns 0x2a4ff6ce ;; If not (offset is non-negative), proceed to syscall (L1).
2a4ff6c0  addsd xmm7, [0x41619680] ;; Otherwise add something to the offset.
2a4ff6c9  cvttsd2si rdx, xmm7 ;; and convert again.
L1:
2a4ff6ce  mov al, 0x0
2a4ff6d0  call rbx
2a4ff6d2  cmp rax, -0x01
2a4ff6d6  jnz 0x2a4ff660    ->LOOP
2a4ff6d8  jmp 0x2a4f0040    ->12
---- TRACE 5 stop -> loop

To me it's clear that the "adding something weird to xmm7 but only if it's negative" is the problem. I think it's an artifact of the nan-tagging scheme but I'm not sure. More in a followup.

wingo commented 6 years ago

Here is the code that emits floating-point-to-integer conversions:

  } else if (stfp) {  /* FP to integer conversion. */
    if (irt_isguard(ir->t)) {
      /* Checked conversions are only supported from number to int. */
      lua_assert(irt_isint(ir->t) && st == IRT_NUM);
      asm_tointg(as, ir, ra_alloc1(as, lref, RSET_FPR));
    } else {
      Reg dest = ra_dest(as, ir, RSET_GPR);
      x86Op op = st == IRT_NUM ? XO_CVTTSD2SI : XO_CVTTSS2SI;
      if (LJ_64 ? irt_isu64(ir->t) : irt_isu32(ir->t)) {
    /* LJ_64: For inputs >= 2^63 add -2^64, convert again. */
    /* LJ_32: For inputs >= 2^31 add -2^31, convert again and add 2^31. */
    Reg tmp = ra_noreg(IR(lref)->r) ? ra_alloc1(as, lref, RSET_FPR) :
                      ra_scratch(as, RSET_FPR);
    MCLabel l_end = emit_label(as);
    if (LJ_32)
      emit_gri(as, XG_ARITHi(XOg_ADD), dest, (int32_t)0x80000000);
    emit_rr(as, op, dest|REX_64, tmp);
    if (st == IRT_NUM)
      emit_rma(as, XO_ADDSD, tmp, &as->J->k64[LJ_K64_M2P64_31]);
    else
      emit_rma(as, XO_ADDSS, tmp, &as->J->k32[LJ_K32_M2P64_31]);
    emit_sjcc(as, CC_NS, l_end);
    emit_rr(as, XO_TEST, dest|REX_64, dest);  /* Check if dest negative. */
    emit_rr(as, op, dest|REX_64, tmp);
    ra_left(as, tmp, lref);
      } else {
    if (LJ_64 && irt_isu32(ir->t))
      emit_rr(as, XO_MOV, dest, dest);  /* Zero hiword. */
    emit_mrm(as, op,
         dest|((LJ_64 &&
            (irt_is64(ir->t) || irt_isu32(ir->t))) ? REX_64 : 0),
         asm_fuseload(as, lref, RSET_FPR));
      }
    }

Upstream link in Snabb here: https://github.com/snabbco/snabb/blob/master/lib/luajit/src/lj_asm_x86.h#L846

Note that this function emits assembly in reverse order, so you kinda have to read it from bottom up. Anyway it's clear that negative floating point numbers are tagged with some kind of bias, and the intention of this code is to remove the bias. But in our case the bias was never added to xmm7! Weird stuff.

wingo commented 6 years ago

I am a bit at a loss, but perhaps @corsix has an idea? Otherwise I think next week I will reduce this to something luajit-only and file upstream. Personally I don't see why these values are going through Lua doubles at all!

wingo commented 6 years ago

(Note that on this system I am running LuaJIT from Snabb, on amd64, no GC64. This is close to the upstream 2.1 branch.)

justincormack commented 6 years ago

This is weird. Technically it shouldnt matter if we pass the values to syscall as ulong or long I think; but the syscall function has them defined as ulong so I tend to cast them to that I think, presumably the long to ulong cast as a noop, but adding a cast to long here fixes this bug. I would be surprised if there werent other places affected by this though.

read returns a Lua number or else its kind of inconvenient, and it will always be within the 32 bit range as Linux does not actually support more than that in a single read/write.

corsix commented 6 years ago
2a4ff671  mov al, 0x0 ;; why is this zeroed? not sure

Because syscall is defined as a vararg function, and the Linux x64 ABI (page 20) dictates that al is used in vararg calls: For calls that may call functions that use varargs or stdargs (prototype-less calls or calls to functions containing ellipsis (...) in the declaration) %al is used as hidden argument to specify the number of vector registers used.


Anyway it's clear that negative floating point numbers are tagged with some kind of bias, and the intention of this code is to remove the bias.

I don't think that is clear, nor is it the intention. Based on the slightly cryptic comment ("For inputs >= 2^63 add -2^64, convert again.") and the semantics of the cvttsd2si instruction, I believe that the intention here is:


The "doing ... something" looks like a LuaJIT bug. In particular, the JIT semantics here are not the same as what your C compiler does for a double -> uint64_t conversion. This is a problem, as the interpreter gives you C semantics, and the JIT should match the interpreter.

corsix commented 6 years ago

With the caveat that double to uint64_t conversion is undefined in C for inputs which are outside of uint64_t range, a test-case is:

local cast = require"ffi".cast
local tonumber = tonumber
local t = {}
for i = 1, 70 do
  t[i] = tonumber(cast("int64_t", cast("uint64_t", -(i+0.2))))
end
for i = 1, 70 do
  assert(t[i] == -i) -- Technically undefined, but seems to match gcc/clang behaviour
end

for i = 1, 70 do
  t[i] = (tonumber(cast("uint64_t", (2^63 + 2^50 * i))) - 2^63) / 2^50
end
for i = 1, 70 do
  assert(t[i] == i)
end

I've not given it much thought (so consider the following with caution), but the following makes the test case pass:

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index af54dc7f11..ebb31a2a62 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -865,8 +865,8 @@ static void asm_conv(ASMState *as, IRIns *ir)
          emit_rma(as, XO_ADDSD, tmp, &as->J->k64[LJ_K64_M2P64_31]);
        else
          emit_rma(as, XO_ADDSS, tmp, &as->J->k32[LJ_K32_M2P64_31]);
-       emit_sjcc(as, CC_NS, l_end);
-       emit_rr(as, XO_TEST, dest|REX_64, dest);  /* Check if dest negative. */
+       emit_sjcc(as, CC_NO, l_end);
+       emit_gri(as, XG_ARITHi(XOg_CMP), dest|REX_64, 1);  /* Is INT_MIN? */
        emit_rr(as, op, dest|REX_64, tmp);
        ra_left(as, tmp, lref);
       } else {
wingo commented 6 years ago

@corsix very nice sleuthing, I owe you one!

I agree with @justincormack that this is not an ljsyscall bug. At first I was wondering why we were even going through floating-point numbers, but I see the syscall code does call tonumber on the result of the syscall after checking that it's not -1, knowing that the return value is in the range [0,2^32). I'll open a PR on luajit shortly.

Here is a little test program to see what the C compiler does in practice, just for information's sake:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
main (int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf (stderr, "usage: %s DOUBLE\n", argv[0]);
      return 1;
    }

  double d;
  char *end = NULL;
  d = strtod (argv[1], &end);
  if (end != argv[1] + strlen(argv[1]))
    {
      fprintf (stderr, "error: failed to parse double: %s\n", argv[1]);
      return 1;
    }

  uint64_t u64 = d;
  fprintf (stdout, "double %f -> uint64_t %lu (0x%lx)\n", d, u64, u64);

  return 0;
}

Here are some interesting examples:

$ /tmp/test -1
double -1.000000 -> uint64_t 18446744073709551615 (0xffffffffffffffff)
$ /tmp/test 18446744073709551615
double 18446744073709551616.000000 -> uint64_t 0 (0x0)
$ /tmp/test 0x100
double 256.000000 -> uint64_t 256 (0x100)
$ /tmp/test 0x8000000000000
double 2251799813685248.000000 -> uint64_t 2251799813685248 (0x8000000000000)
$ /tmp/test 0xffffffffffffffff
double 18446744073709551616.000000 -> uint64_t 0 (0x0)
$ /tmp/test -0x8000000000000
double -2251799813685248.000000 -> uint64_t 18444492273895866368 (0xfff8000000000000)

Here's the assembly for the conversion (GCC):

  40068d:   f2 0f 10 0d 13 02 00    movsd  xmm1,[rip+0x213]    ;; 2^63 or 9.2233720368547758e+18
  400694:   00 
  400695:   66 0f 2e c1             ucomisd xmm0,xmm1
  400699:   73 25                   jae    4006c0 <main+0xb0>
  40069b:   f2 48 0f 2c d0          cvttsd2si rdx,xmm0
;; ... non-interesting call to printf elided ...

  4006c0:   66 0f 28 d0             movapd xmm2,xmm0
  4006c4:   48 b8 00 00 00 00 00    movabs rax,0x8000000000000000
  4006cb:   00 00 80 
  4006ce:   f2 0f 5c d1             subsd  xmm2,xmm1
  4006d2:   f2 48 0f 2c d2          cvttsd2si rdx,xmm2
  4006d7:   48 31 c2                xor    rdx,rax
  4006da:   eb c4                   jmp    4006a0 <main+0x90>

I have to think we should be doing this like GCC does, if we aim to match the interpreter, which converts double to u64 via:

static LJ_AINLINE uint64_t lj_num2u64(lua_Number n)
{
#ifdef _MSC_VER
  if (n >= 9223372036854775808.0)  /* They think it's a feature. */
    return (uint64_t)(int64_t)(n - 18446744073709551616.0);
  else
#endif
    return (uint64_t)n;
}

I want to have a crack at this. I'll prep a PR!

corsix commented 6 years ago

Here's the assembly for the conversion (GCC):

FWIW, my interpretation of what gcc and clang do is the following:

static LJ_AINLINE uint64_t lj_num2u64(lua_Number n) {
  return n >= 2^63 ? cvttsd2si(n - 2^63) + 2^63 : cvttsd2si(n);
}

On master, I think the JIT behaviour is:

static LJ_AINLINE uint64_t lj_num2u64(lua_Number n) {
  return (n <= -0.5 || n >= 2^63) ? cvttsd2si(n - 2^64) + 2^64 : cvttsd2si(n);
}

With my patch above, the JIT behaviour ends up being:

static LJ_AINLINE uint64_t lj_num2u64(lua_Number n) {
  return n >= 2^63 ? cvttsd2si(n - 2^64) + 2^64 : cvttsd2si(n);
}

I have to think we should be doing this like GCC does, if we aim to match the interpreter

I don't feel great about trying to match undefined behaviour, as - being undefined - it is subject to change. I'd be inclined to instead change the interpreter to match the (patched) JIT behaviour (if need be, by writing lj_num2u64 in dasm/assembly rather than C).

wingo commented 6 years ago

Nice summary. Given that we'd be changing our behavior it seems easiest though to match the JIT to the current compiled behavior of the interpreter, and only if needed in the future rewrite that bit of the interpreter... dunno. It would match C programmers' current understanding of things.

I think the specifics of how GCC works is slightly better than your version as it avoids a double-conversion in the case of a large argument, and gets the check earlier (operating on float rather than the integer result). WDYT?

wingo commented 6 years ago

Actually for a double -> signed int64_t conversion, I see GCC doing this:

   0x0000000000400673 <+99>:    ucomisd 0x21d(%rip),%xmm0        # 2^63
   0x000000000040067b <+107>:   jbe    0x400685 <main+117>
   0x000000000040067d <+109>:   subsd  0x21b(%rip),%xmm0        # 2^64
   0x0000000000400685 <+117>:   cvttsd2si %xmm0,%rdx
corsix commented 6 years ago

I think the specifics of how GCC works is slightly better than your version as it avoids a double-conversion in the case of a large argument, and gets the check earlier (operating on float rather than the integer result). WDYT?

I'd be motivated by something which makes the JIT's life simpler, i.e. minimise number of temporary registers required, minimise number of distinct magic constants required, and minimise branching. The current JIT formulation of two dependent cvttsd2si instructions is sad from a cycle-counting perspective, but good on the temporary registers front, and the distinct magic constants front, and fairly good on the branching front (noting that x; if (c) { y; } is nicer than if (c) { x; } else { y; }). My formulation of lj_num2u64 would be something like the following, which still has two cvttsd2si instructions, but they can execute in parallel (at the cost of extra temporaries, which is expensive in the JIT, but fine in the interpreter):

cvttsd2si rax, xmm0
movapd xmm1, xmm0
sub xmm1, 2^64
cvttsd2si rcx, xmm1
cmp rax, 1
cmovo rax, rcx
ret
wingo commented 6 years ago

I think your proposal does not handle an input of INT_MIN correctly but I could be wrong.

I see the attraction of making the interpreter's semantics correspond exactly to the JIT, and there doesn't seem to be a way to do that for inputs > 2^64 without assembly, without adding more checks to the JIT at least. Unsatisfying though :/

Relatedly I am not sure why I never get INT_MIN as a result when converting on my machine. E.g. take my "test.c" and run with an input of 1e30 (out of u64 range). The cvtsd2si docs suggest INT_MIN should be returned, but I don't get that:

$ /tmp/test 1e30
double 1000000000000000019884624838656.000000 -> uint64_t 0 (0x0)

If we could rely on that then lj_num2u64 could simply be something that's easy to verify, as it's not speed-sensitive...

uint64_t
lj_num2u64 (double n)
{
  if (n < -2^63) then return INT_MIN;
  if (n < 2^63) then return n;
  if (n < 2^64) then return n - 2^64;
  return INT_MIN;
}
corsix commented 6 years ago

I think your proposal does not handle an input of INT_MIN correctly but I could be wrong.

The same thought crossed my mind, but I think it is correct: if cvttsd2si legitimately returns INT_MIN, then subtracting anything from its input will give an out-of-range input, and the result of cvttsd2si on that out-of-range input will be INT_MIN.

Relatedly I am not sure why I never get INT_MIN as a result when converting on my machine.

Because, if the gcc semantics are what I wrote, it adds 2^63 to the result of cvttsd2si, which changes INT_MIN to 0.

wingo commented 6 years ago

Very good points, thank you for walking through it with me!

wingo commented 6 years ago

OK just to finish off here with what GCC does --

  uint64_t u64 = d;
  fprintf (stdout, "double %f -> uint64_t %lu (0x%lx)\n", d, u64, u64);

  int64_t i64 = d;
  fprintf (stdout, "double %f -> int64_t %ld (0x%lx)\n", d, i64, i64);

  uint32_t u32 = d;
  fprintf (stdout, "double %f -> uint32_t %u (0x%x)\n", d, u32, u32);

  int32_t i32 = d;
  fprintf (stdout, "double %f -> int32_t %d (0x%x)\n", d, i32, i32);
$ /tmp/test 1e30
double 1000000000000000019884624838656.000000 -> uint64_t 0 (0x0)
double 1000000000000000019884624838656.000000 -> iint64_t -9223372036854775808 (0x8000000000000000)
double 1000000000000000019884624838656.000000 -> uint32_t 0 (0x0)
double 1000000000000000019884624838656.000000 -> int32_t -2147483648 (0x80000000)

So your proposal is that in LuaJIT we (edited)

This last one is a little out-of-place, to my mind, but I guess there's nothing perfect.

Would you propose that we behave analogously for conversions to int32_t and uint32_t ?

wingo commented 6 years ago

Sorry! accidentally closed while realizing that your proposal correctly handled the whole range. I must need more coffee!

wingo commented 6 years ago

Yay I have LuaJIT emitting your code, @corsix --

2a50ffb3  cvttsd2si rbx, xmm7
2a50ffb8  movaps xmm6, xmm7
2a50ffbb  subsd xmm6, [0x41c52680]
2a50ffc4  cvttsd2si rdi, xmm6
2a50ffc9  cmp rbx, +0x01
2a50ffcd  cmovo rbx, rdi

PR coming!