Closed clementfarabet closed 9 years ago
I'm not sure why i don't see a problem here. Lua numbers are double precision 64-bit floating point numbers, so anything beyond 2^48 should only be approximately accurate, not exactly. That would be the max capacity of the significand right?
On Wednesday, July 1, 2015, Clement Farabet notifications@github.com wrote:
Hey guys,
we recently noticed weird edge effects when using large numbers in functions like torch.random, torch.uniform, ... after quite a bit of debugging, we discovered that lua_tostring has a side effect when the input is a (large) number. This is pretty scary, has lua_tostring is used automatically by all the API calls generated by cwrap (so all functions in TensorMath).
Here's how to reproduce it:
include
include
include
static int test(lua_State *L) { long nb1 = (long)lua_tonumber(L, 1); long nb2 = (long)lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
lua_tostring(L, 2);
nb1 = lua_tonumber(L, 1); nb2 = lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
return 0; } static const struct luaL_reg routines[] = { {"test", test}, {NULL, NULL} }; extern int luaopen_libtest(lua_State *L) { luaL_openlib(L, "libtest", routines, 0); return 1; }
Save this to test.c and run this:
!/usr/bin/env bash
gcc -o libtest.dylib -shared test.c -I ~/local/include/ -undefined dynamic_lookup~/local/bin/th -e "\ local lib = require './libtest' print('') print('ok:') lib.test(128,256) print('') print('ok:') lib.test(2^32, 2^34) print('') print('wrong:') lib.test(2^49, 2^49) print('') print('wrong:') lib.test(2^52, 2^52)"
With Lua 5.1 it gives this:
ok: a,b = 0x80,0x100 a,b = 0x80,0x100
ok: a,b = 0x100000000,0x400000000 a,b = 0x100000000,0x400000000
wrong: a,b = 0x2000000000000,0x2000000000000 a,b = 0x2000000000000,0x1fffffffffffe
wrong: a,b = 0x10000000000000,0x10000000000000 a,b = 0x10000000000000,0x10000000000004
Can someone confirm that he's also seeing this problem?
Does someone know of a limit on Lua numbers, where beyond ~ 2^46 things just break?
— Reply to this email directly or view it on GitHub https://github.com/torch/torch7/issues/281.
Lua 5.3 has a native 64-bit integer, using which you won't see such side effects
On Wednesday, July 1, 2015, soumith soumith@gmail.com wrote:
I'm not sure why i don't see a problem here. Lua numbers are double precision 64-bit floating point numbers, so anything beyond 2^48 should only be approximately accurate, not exactly. That would be the max capacity of the significand right?
On Wednesday, July 1, 2015, Clement Farabet <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Hey guys,
we recently noticed weird edge effects when using large numbers in functions like torch.random, torch.uniform, ... after quite a bit of debugging, we discovered that lua_tostring has a side effect when the input is a (large) number. This is pretty scary, has lua_tostring is used automatically by all the API calls generated by cwrap (so all functions in TensorMath).
Here's how to reproduce it:
include
include
include
static int test(lua_State *L) { long nb1 = (long)lua_tonumber(L, 1); long nb2 = (long)lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
lua_tostring(L, 2);
nb1 = lua_tonumber(L, 1); nb2 = lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
return 0; } static const struct luaL_reg routines[] = { {"test", test}, {NULL, NULL} }; extern int luaopen_libtest(lua_State *L) { luaL_openlib(L, "libtest", routines, 0); return 1; }
Save this to test.c and run this:
!/usr/bin/env bash
gcc -o libtest.dylib -shared test.c -I ~/local/include/ -undefined dynamic_lookup~/local/bin/th -e "\ local lib = require './libtest' print('') print('ok:') lib.test(128,256) print('') print('ok:') lib.test(2^32, 2^34) print('') print('wrong:') lib.test(2^49, 2^49) print('') print('wrong:') lib.test(2^52, 2^52)"
With Lua 5.1 it gives this:
ok: a,b = 0x80,0x100 a,b = 0x80,0x100
ok: a,b = 0x100000000,0x400000000 a,b = 0x100000000,0x400000000
wrong: a,b = 0x2000000000000,0x2000000000000 a,b = 0x2000000000000,0x1fffffffffffe
wrong: a,b = 0x10000000000000,0x10000000000000 a,b = 0x10000000000000,0x10000000000004
Can someone confirm that he's also seeing this problem?
Does someone know of a limit on Lua numbers, where beyond ~ 2^46 things just break?
— Reply to this email directly or view it on GitHub https://github.com/torch/torch7/issues/281.
No it should be perfectly clean until 2^53, that's what doubles give you. The problem I'm showing here is that lua_tostring has a side effect on the input number. It switches a bit or something.
Clément
On Jul 2, 2015, at 12:35 AM, Soumith Chintala notifications@github.com wrote:
Lua 5.3 has a native 64-bit integer, using which you won't see such side effects
On Wednesday, July 1, 2015, soumith soumith@gmail.com wrote:
I'm not sure why i don't see a problem here. Lua numbers are double precision 64-bit floating point numbers, so anything beyond 2^48 should only be approximately accurate, not exactly. That would be the max capacity of the significand right?
On Wednesday, July 1, 2015, Clement Farabet <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Hey guys,
we recently noticed weird edge effects when using large numbers in functions like torch.random, torch.uniform, ... after quite a bit of debugging, we discovered that lua_tostring has a side effect when the input is a (large) number. This is pretty scary, has lua_tostring is used automatically by all the API calls generated by cwrap (so all functions in TensorMath).
Here's how to reproduce it:
include
include
include
static int test(lua_State *L) { long nb1 = (long)lua_tonumber(L, 1); long nb2 = (long)lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
lua_tostring(L, 2);
nb1 = lua_tonumber(L, 1); nb2 = lua_tonumber(L, 2); printf("a,b = 0x%lx,0x%lx\n",nb1,nb2);
return 0; } static const struct luaL_reg routines[] = { {"test", test}, {NULL, NULL} }; extern int luaopen_libtest(lua_State *L) { luaL_openlib(L, "libtest", routines, 0); return 1; }
Save this to test.c and run this:
!/usr/bin/env bash
gcc -o libtest.dylib -shared test.c -I ~/local/include/ -undefined dynamic_lookup~/local/bin/th -e "\ local lib = require './libtest' print('') print('ok:') lib.test(128,256) print('') print('ok:') lib.test(2^32, 2^34) print('') print('wrong:') lib.test(2^49, 2^49) print('') print('wrong:') lib.test(2^52, 2^52)"
With Lua 5.1 it gives this:
ok: a,b = 0x80,0x100 a,b = 0x80,0x100
ok: a,b = 0x100000000,0x400000000 a,b = 0x100000000,0x400000000
wrong: a,b = 0x2000000000000,0x2000000000000 a,b = 0x2000000000000,0x1fffffffffffe
wrong: a,b = 0x10000000000000,0x10000000000000 a,b = 0x10000000000000,0x10000000000004
Can someone confirm that he's also seeing this problem?
Does someone know of a limit on Lua numbers, where beyond ~ 2^46 things just break?
— Reply to this email directly or view it on GitHub https://github.com/torch/torch7/issues/281.
— Reply to this email directly or view it on GitHub.
Isn't it related to the precision used by the format specifier (see LUA_NUMBER_FMT
):
/*
@@ LUA_NUMBER_SCAN is the format for reading numbers.
@@ LUA_NUMBER_FMT is the format for writing numbers.
@@ lua_number2str converts a number to a string.
@@ LUAI_MAXNUMBER2STR is maximum size of previous conversion.
*/
#define LUA_NUMBER_SCAN "%lf"
#define LUA_NUMBER_FMT "%.14g"
#define lua_number2str(s,n) sprintf((s), LUA_NUMBER_FMT, (n))
#define LUAI_MAXNUMBER2STR 32 /* 16 digits, sign, point, and \0 */
Here 2^49 = 562949953421312
has 15 significant digits. So if you increase the precision you get:
> x = 2^49
> y = string.format("%.15g", x)
> z = tonumber(y)
> = x == z
true
> print(string.format("0x%x", x))
0x2000000000000
> print(string.format("0x%x", z))
0x2000000000000
@deltheil looks like you're right since lua_number2str is called by lua_tostring, and the definition in luaconf.h is:
define lua_number2str(s,n) sprintf((s), LUA_NUMBER_FMT, (n))
But why would lua_tostring modify the value that's on the stack? If I understand Clement correctly, that's what he's observing.
lua_tolstring is actually supposed to change the actual value in the stack to a string. Then to_number converts it back to a lua_Number. I guess the following lines are where it happens:
int luaV_tostring (lua_State *L, StkId obj) {
if (!ttisnumber(obj))
return 0;
else {
char s[LUAI_MAXNUMBER2STR];
lua_Number n = nvalue(obj);
lua_number2str(s, n);
setsvalue2s(L, obj, luaS_new(L, s));
return 1;
}
}
Yes and this is confirmed by the Lua 5.1 manual:
If the value is a number, then
lua_tolstring
also changes the actual value in the stack to a string.
I didn't know that. I thought it'd just return a string but leave the value on the stack unchanged. That doesn't seem very intuitive but at least it's documented.
To verify the full stack, I changed LUA_NUMBER_FMT to "%.15g" in luaconf and re-installed lua. Now Clement's original test works as expected. Does it look like a sustainable solution to you guys?
Even increasing the precision of tostring will still remain problematic right, for larger numbers? Essentially, any number that's > 10^15 will be approximated once it goes through the cwrap checks (a lot of torch functions).
It would be enough for doubles, not for unsigned longs. You would have to go as far as "%.19g" to handle this case I guess.
So unsigned longs can't work anyway because lua numbers are doubles (at least before lua 5.3).
Why do you think they decided to render number with %.14g then?
Yeah I was thinking about lua 5.3 :) No idea what they intended though, we should ask them I guess!
Well there is this (long) thread on the mailing list, with in particular these interesting answers (the last one by Roberto).
@andresy please check the PR https://github.com/torch/torch7/pull/282 – this one is pretty critical.
Thanks @deltheil ! Also worth mentioning another answer in this thread:
"The key sentence in Roberto's e-mail is 'Well, we do not.', which was in response to the answer "Yes, I think we do." in reply to the question 'Do we really want "tostring(0.1)" to return "0.10000000000000001"?'
I.e. the Lua team is not going to change the default output format, no matter how long this thread goes on.
Signing off, Dirk "
So, basically we have to find another solution (cf. https://github.com/torch/torch7/pull/282).
https://github.com/torch/torch7/pull/282 merged – issue resolved.
Hey guys,
we recently noticed weird edge effects when using large numbers in functions like torch.random, torch.uniform, ... after quite a bit of debugging, we discovered that
lua_tostring
has a side effect when the input is a (large) number. This is pretty scary, aslua_tostring
is used automatically by all the API calls generated by cwrap (so all functions in TensorMath).Here's how to reproduce it:
Save this to
test.c
and run this (with the right paths to your lua):With Lua 5.1 it gives this:
Can someone confirm that he's also seeing this problem?
Does someone know of a limit on Lua numbers, where beyond ~ 2^46 things just break?
/cc @andresy @koraykv @soumith