mpx / lua-cjson

Lua CJSON is a fast JSON encoding/parsing module for Lua
https://kyne.au/~mark/software/lua-cjson.php
MIT License
936 stars 478 forks source link

Allow differentiation of arrays and objects for proper empty-object serialization #11

Open kikito opened 11 years ago

kikito commented 11 years ago

This issue is related to #6, in which a global switch is suggested in order to force empty objects to "always be arrays" or "always be objects".

I would argue that while that will solve some specific issues, it will not resolve the more general one, which is differentiating between empty arrays and empty objects in general. Consider the following string:

local json_str = '{"items":[],"properties":{}}'

Ideally cjson should be able to decode that string to a lua object, that when encoded back to json returns the same string.

assert(cjson.encode(cjson.decode(json_str))

This means that cjson needs a way to "mark" tables that are arrays. This could be used by the decoder itself, but would be useful if exposed, so people could give cjson "hints" about what must be an array, even when empty.

I'm writing a possible Lua implementation here. In my case I'm using metatables, but maybe someone with more C knowledge can think of other ideas.

function cjson.mark_as_array = function(t)
  local mt = getmetatable(t) or {}
  mt.__is_cjson_array = true
  return setmetatable(t, mt)
end

function cjson.is_marked_as_array(t)
  local mt = getmetatable(t)
  return mt and mt.__is_cjson_array
end

cjson.encode should mark all arrays it finds using the cjson.mark_as_array.

On the other hand, when cjson.decode encountered an empty table, it should use cjson.is_marked_as_array to decide whether to return "{}" or "[]". Notice that this would only apply to empty tables. A lua table like {1,2, a = "foo"} should be encoded as an object, even when marked as an array.

I would gladly provide a pull request for this, but I don't do C. Please consider this a feature request.

sirupsen commented 9 years ago

I'd love to see a workaround for this as well.

thibaultcha commented 9 years ago

+1, this makes the library hard to use. I am surprised that it can handle sparse arrays but not empty ones? Or am I missing something?

@kikito's solution seems to be a good one. Would you consider a PR on it if someone has the time and knowledge to implement it?

thibaultcha commented 9 years ago

@kikito Since it's been a while, may I ask if you have found any solution to this problem?

brimworks commented 9 years ago

I also just ran into this problem. I like @kikito 's idea for implementation, I'll see if I can implement this.

thibaultcha commented 9 years ago

The project seems dormant so I would consider that before jumping on it...

brimworks commented 9 years ago

Well, I'm actively using it, so maybe I'll have to take it over?

brimworks commented 9 years ago

Here is my implementation:

https://github.com/brimworks/lua-cjson/commit/3499130c852a993e4afc741aa2aa902959775b30

kikito commented 9 years ago

@brimworks looks good! I would suggest adding some tests on the test folder

thibaultcha commented 9 years ago

Yay nice! Good job @brimworks! I hope this will be available.

brimworks commented 9 years ago

After using this a bit, I've come to the conclusion that it is better to use the Lua 5.3 idiom of using the __name field of the metatable. This change is two fold:

https://github.com/brimworks/lua-cjson/commit/95110717a823c2030fcdb7f9c375a11e322097b3

If other libraries (like lyaml) use a similar implementation, then "null" and "array" can be more universal across libraries rather than the previous __is_cjson_array which is very specific to this library.

If this is interesting to others I can issue a pull request. Just let me know.

Thanks, -Brian

mpx commented 8 years ago

Yes, Lua CJSON should use a metatable to record array vs object to make encoding deterministic. It would also allow code to check whether a table decoded from JSON is an array or object.

It appears each library uses a different method via a metatable. Eg, dkjson uses __jsontype == "object" or "array". It's not clear to me which option would be best, but one of them needs to be chosen.

brimworks commented 8 years ago

FWIW, I'd suggest using __name field of metatable to determine "object" vs "array" vs "null" vs "number". This aligns with Lua 5.3 luaL_newmetatable(L, name) method which records the name in the __name field of the metatable:

http://www.lua.org/manual/5.3/manual.html#luaL_newmetatable

This is the methodology I've used in my fork of CJSON.

Thanks, -Brian

identicalsnowflake commented 7 years ago

Any activity here? I'm coming from Redis, which has cjson builtin to its lua scripting package, so I do not have the option of simply using a fork.

This issue is especially problematic in a Redis context, because data may be subsequently accessed by a language which treats {} and [] differently, resulting in decoding errors.

penguinol commented 7 years ago

Using redis too, have the same problem. Is there any schedule for resolving this issue?

brimworks commented 7 years ago

@mpx do plan to follow up on this? If you would prefer to no longer maintain the lua-cjson code, I can take over ownership.

Thanks, -Brian

bjornbytes commented 7 years ago

LÖVR is exposing lua-cjson now and would love to see this implemented.

thibaultcha commented 7 years ago

You might want to check out the OpenResty fork which implements this and a bunch of other fixes/improvements:

https://github.com/openresty/lua-cjson

ghost commented 7 years ago

@kikito, this untested version doesn't touch object's metatable, implemented via weak cache:

local array_marks = setmetatable({ }, { __mode = 'k' }) -- { [weak t] = true }

function cjson.mark_as_array = function(t)
  array_marks[t] = true
end

function cjson.is_marked_as_array(t)
  return array_marks[t]
end

-- or even simpler

cjson.is_array = setmetatable({ }, { __mode = 'k' }) -- { [weak t] = true }
cjson.is_array[t] = true
print(cjson.is_array[t]) -- true

C variant:

static int array_marks_key; // value is not used, just unique address

static void
get_array_marks(lua_State *L)
{
  // = registry[array_marks_key]
  lua_pushlightuserdata(L, &array_marks_key);
  lua_gettable(L, LUA_REGISTRYINDEX);

  if (lua_isnil(L, -1)) {
    lua_pop(L, 1);

    // arrays = setmetatable({ }, { __mode = 'k' })
    lua_createtable(L, 0, 0); // arrays
    lua_createtable(L, 0, 1); // mt
    lua_pushstring(L, "k");
    lua_setfield(L, -2, "__mode");
    lua_setmetatable(L, -2);

    // registry[array_marks_key] = arrays
    lua_pushlightuserdata(L, &array_marks_key);
    lua_pushvalue(L, -2);
    lua_settable(L, LUA_REGISTRYINDEX);
  }

  return; // array_marks at the top
}

static int
mark_as_array(lua_State *L)
{
  // array_marks[t] = true
  get_array_marks(L);
  lua_pushvalue(L, 1); // t
  lua_pushboolean(L, 1); // true
  lua_settable(L, -3);

  return 0;
}

static int
is_marked_as_array(lua_State *L)
{
  // x = array_marks[t]
  get_array_marks(L);
  lua_pushvalue(L, 1); // t
  lua_gettable(L, -2);

  return 1; // x
}

Usage:

t = { { }, { } }
cjson.mark_as_array(t[2]) -- or cjson.is_array[t[2]] = true

print(cjson.encode(t)) -- expecting [ { }, [ ] ]

I'm into C, but not into pull requests :(