nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.67k stars 3.13k forks source link

Review all uses of lua numbers to determine if they ought to be integers #3219

Open pjsg opened 4 years ago

pjsg commented 4 years ago

As Philip has correctly pointed out, we have a soft break introduced in Lua 5.3 in the we've moved our default build to sint32 / float32 sub types so that a TValue can fit into 8 rather than 12 bytes. The standard Lua C API has a set of calls relating to number and integer handling that are largely preserved.

A lot of our modules adopt rather sloppy use of the API for example int val = lua_tonumber(L,1). In Lua 5.1 the API call returns a double which is then converted inline to a type int or unsigned int. This is lossless if the underlying Lua value contained this type.

This issue largely subsumes the sub issues: #3213, #3214, #3215, #3216, #3217 and #3218, I suggest that we handle general number type issues here. See my further analysis below.


[Philip's original lede] It turns out that it isn't just lua_pushnumber, but the reading functions can have problems as well.

rtcmem.write32 has this problem (from code inspection). I think that the json decoder will have problems....

KT819GM commented 4 years ago

Is this behavior because of same reasons?

Example code:

data = {
mixed_data = node.chipid()..'/data',
mixed_status = node.chipid()..'/status',
integer = 500,
integer2 = 1000,
boolean_f = false,
boolean_t = true,
}

print('Integer: '..tostring(data.integer)..' boolean: '..tostring(data.boolean_f))

sjson.encode(data) 
ok, json = pcall(sjson.encode, data)
json_obj = sjson.decode(json)
print(type(json_obj))
for k, v in pairs(json_obj) do     print(k..' = '..tostring(v)..',') end

Result:


Integer: 500 boolean: false

table

boolean_t = true,
integer2 = 1000.0,
boolean_f = false,
mixed_data = 1111111/data,
integer = 500.0,
mixed_status = 1111111/status,

I'm using tostring because of mixed table, as on example I've provided. It could be seen, that before conversion tostring behaves as expected, but after sjson I'm getting .0. Also, result came in different order, possibly using sjson wrongly.

Edit: forgot to write that it's on dev 5.3

pjsg commented 4 years ago

sjson is definitely a module that needs some work to get the integer/float handling correct. I'm going to be looking at it over the weekend.

pjsg commented 4 years ago

@KT819GM Take a look at https://github.com/nodemcu/nodemcu-firmware/pull/3222 -- this should fix your problem.

KT819GM commented 4 years ago

@KT819GM Take a look at #3222 -- this should fix your problem.

Yes, now numbers are represented correctly. Not sure why string order is changing, not a problem for me (for config file) but possibly could be issue for someone who will be parsing json string with fixed position (like parsing object in Node-Red and adding values to array renaming them). Thank you

pjsg commented 4 years ago

The order of elements in a Json object are not defined. Some systems have ways of forcing alphabetical ordering, but not nodemcu.

On Sun, Jul 19, 2020, 09:51 Modestas Bunokas notifications@github.com wrote:

@KT819GM https://github.com/KT819GM Take a look at #3222 https://github.com/nodemcu/nodemcu-firmware/pull/3222 -- this should fix your problem.

Yes, now numbers are represented correctly. Not sure why string order is changing, not a problem for me (for config file) but possibly could be issue for someone who will be parsing json string with fixed position (like parsing object in Node-Red and adding values to array renaming them). Thank you

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nodemcu/nodemcu-firmware/issues/3219#issuecomment-660646555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALQLTMFUWDN7QV4I2T5DLLR4L25PANCNFSM4O5KW77A .

KT819GM commented 4 years ago

Couldn't be more clear. Thank you

TerryE commented 4 years ago

The C API defines the following number and integer functions. The former use type LUA_NUMBER and the latter LUA_INT for numeric arguments and returns. These are double and int on 5.1 and float and int on 5.3, respectively. The declarative types are lua_Number and lua_Integer.

Number form Integer Form Comments
lua_isnumber(L,n) lua_isinteger(L,n)
lua_pushnumber(L,i) lua_pushinteger(L,i)
lua_tonumber(L,n) lua_tointeger(L,n)
luaL_checknumber(L,n) luaL_checkinteger(L,n)
luaL_optnumber(L,n,def) luaL_optinteger(L,n,def)
lua_stringtonumber(L,n,s) converts to number/int

Note that:

I recommend that:

I will post a fully code analysis next, but I have to be sociable to visitors for the next few hours.

TerryE commented 4 years ago

Having gone through the code, I don't want to break the legacy 5.1 integer mode builds and the issue we have for these is that lua_Number is type int and using lua_Number for a default float type would break these for 5.1 integer builds, so on reflection it is often just simpler leaving the internal float temporary variable as double for both builds. I have also added a lua_Float which is a double on 5.1 builds and a synonym for lua_Number on 5.3; when we just want FP, but only need 6 d.p. we can use lua_Float and this will also work fine on both Lua variants.

Other reviews changes.

Slight change of subject

When I 've been going through this code, I find myself slipping into #1028, in that the way that we've coded up much of the parameter validation is just so sloppy; this can be done faster, generated less code, and also tighter. A typical coding pattern is something along the lines of

    lua_getfield(L, 1, "start_ch");
    if (!lua_isnil(L, -1)){
      if(lua_isnumber(L, -1)){
        start_ch = (uint8)luaL_checkinteger(L, -1);
        luaL_argcheck(L, (start_ch >= 1 && start_ch <= 14), 1, "start_ch: Range:1-14");
        cfg.schan = start_ch;
      }
      else
        luaL_argerror(L, 1, "start_ch: must be a number");
    }
    else
      cfg.schan = 1;

"Some parameter is an integer in the range x to y; either with no default or with a specific default" is a really common coding pattern in our modules, and this is coded "long hand" in about 4 different ways. So in this case omitting start_ch defaults to 1 but what happens with start_ch = 1.0 or start_ch = 2.1 even? In some coding cases the latter fails to an invalid parameter error); some cases you get a "number has no integer representation_" error; others 0; others the "_startch: must be a number" error. It's basically a mess and I feel that we should have a consistent coding pattern that we should adopt. A common mistake is to do an API call such as luaL_checkinteger() which throws an error if the stack entry isn't an integer but then do extra validation where the error paths will never be taken. There are also nice macro extensions such as luaL_optinteger(L, ndx, default) which greatly simplify default handing.

#define luaL_opt(L,f,n,d)   (lua_isnoneornil(L,(n)) ? (d) : f(L,(n)))

Lua itself give some good templates, for example in ltablib.c for the argument to table.remove:

  lua_Integer pos = luaL_optinteger(L, 2, size);
  if (pos != size) 
    luaL_argcheck(L, 1 <= pos && pos <= size + 1, 1, "position out of bounds");

Our coded equivalents might be a dozen lines instead of 3 and not be as robust. Pull my hair out time. In my bones I would like to sort out this, but being realistic with every edit there is a small chance of making a breaking change, and so this sort of code change really needs testing. So on reflection and however tempting, this isn't something that we should realistically as part of this lua numbers review.

Defer the API cleanup to a second pass, starting with node, file, net and wifi as separate issues / PRs , Perhaps we start with one of these and get a couple of the other committers (@nwf, @HHHartmann, ... or whoever wants to volunteer) to go through the changes in details so we can establish an agreed best-practice pattern, and then we can divide the rest out between those committers than would like to help execute the cleanup.

Thoughts? N + G + @jmattsson @pjsg @galjonsfigur ... ?

jmattsson commented 4 years ago

Definitely defer this, I'd say. While I absolutely agree it's something that should get done, let's get all the big LVM stuff landed and settled. I'd like to get to the point where we can resume merging the 8266 and esp32 branches, but it'd be madness to have two major tickets running concurrently :D

galjonsfigur commented 4 years ago

I agree with @jmattsson - It's true that many of C modules are interacting with Lua C API in sloppy way, but they are proven to work. For example, there were no big changes of Lua interface for ta least 2 years in wifi C module (aside from some code replacements with luaL_ and lua_ helper functions). Deferring this to be done after merging esp8266 and esp32 branches would give us another benefit of being able to cleanup C modules one by one, but for both variants, additionally verifying API compatibility between them.

TerryE commented 4 years ago

See latest push to #3221.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TerryE commented 3 years ago

Not safe to close this one.