lcsmuller / JSCON

(WIP) JSCON is a C library for effectively serializing and deserializing JSON data.
MIT License
2 stars 1 forks source link

deserialize nested json objects with json_scanf #1

Open stensalweb opened 3 years ago

stensalweb commented 3 years ago

json_scanf is very clever. But If I have a nested json object, can I still use json_scanf?

For example, given the following string

"{
   'project-configurations': 
   [
        { 'name':  'p1',   'build': 'b1' },
        { 'name':  'p2',   'build': 'b2' }
   ]
}"

Can I still use json_scanf to de-serialize it?

stensalweb commented 3 years ago

does the order of fields matter or not?

{  'build': 'b1',  'name': 'p1' }
{ 'name': 'p1', 'build':'b1' };

can json_scanf (s, "%s[name]%s[build]", name, build); parse the above two strings?

lcsmuller commented 3 years ago

json_scanf is very clever. But If I have a nested json object, can I still use json_scanf?

For example, given the following string

"{
   'project-configurations': 
   [
        { 'name':  'p1',   'build': 'b1' },
        { 'name':  'p2',   'build': 'b2' }
   ]
}"

Can I still use json_scanf to de-serialize it?

Hello, thank you very much for the feedback! Unfortunately jscon_scanf can't deserialize nested objects, but I plan on making that possible in the future. What you can do instead is deserialize 'project-configurations' with the %ji format, and then access its nests with 'getter' functions, such as:

jscon_item_t *project_configurations;
jscon_scanf(str, "%ji[project-configurations]", &project_configurations);
jscon_item_t *obj1 = jscon_get_byindex(project_configurations, 0);
jscon_item_t *obj2 = jscon_get_byindex(project_configurations, 1);
lcsmuller commented 3 years ago

does the order of fields matter or not?

{  'build': 'b1',  'name': 'p1' }
{ 'name': 'p1', 'build':'b1' };

can json_scanf (s, "%s[name]%s[build]", name, build); parse the above two strings?

Absolutely! The order is unimportant, just as long the arguments match the order of the field, its all that really matters.

stensalweb commented 3 years ago

It would really great if json_scanf can handle nested json objects. Another question, how the absence of a field is represented in your de-serialized structures?

stensalweb commented 3 years ago

This https://zserge.com/jsmn can be used to prepreocess the input string buffer to find the start and end of each nested object, but json_scanf will need to be augmented to accept buffer_start and buffer_end.

lcsmuller commented 3 years ago

It would really great if json_scanf can handle nested json objects. Another question, how the absence of a field is represented in your de-serialized structures?

Not sure if thats the answer to your question, but if a field is absent from the string being parsed it will simply be ignored and nothing else.

Maybe it would be better if jscon_scanf returned a integer with the amount of absent fields caught, but the user will still have to check for each argument in order to figure out where the absence occurred. Any suggestions are appreciated!

lcsmuller commented 3 years ago

This https://zserge.com/jsmn can be used to prepreocess the input string buffer to find the start and end of each nested object, but json_scanf will need to be augmented to accept buffer_start and buffer_end.

This is a really interesting approach, I'll look further into it!

stensalweb commented 3 years ago

It would really great if json_scanf can handle nested json objects. Another question, how the absence of a field is represented in your de-serialized structures?

Not sure if thats the answer to your question, but if a field is absent from the string being parsed it will simply be ignored and nothing else.

Maybe it would be better if jscon_scanf returned a integer with the amount of absent fields caught, but the user will still have to check for each argument in order to figure out where the absence occurred. Any suggestions are appreciated!

my question is "if a field is absent in a JSON string, how do I know whether the field is absent or not in the corresponding de-serialized instance" for example, the two json strings:

{  "field_a":  false, "field_b":  "this is a string"  }
{                     "field_b": "this is another string"  }

both are de-serialized as instances: a and b

when I access a->field_a and b->field_a, both might be false, but b actually does not have field_a. The absence of a field is, in general, different from the field's default value. In Javascript, typeof a.field_a is boolean, and typeof b.field_a is undefined, so I can detect field_a is absent in b. How do I detect that in an instance de-serialized by json_scanf?

lcsmuller commented 3 years ago

my question is "if a field is absent in a JSON string, how do I know whether the field is absent or not in the corresponding de-serialized instance"

for example, the two json strings:


{  "field_a":  false, "field_b":  "this is a string"  }

{                     "field_b": "this is another string"  }

both are de-serialized as instances: a and b

when I access a->field_a and b->field_a, both might be false, but b actually does not have field_a. The absence of a field is, in general, different from the field's default value. In Javascript, typeof a.field_a is boolean, and typeof b.field_a is undefined, so I can detect field_a is absent in b. How do I detect that in an instance de-serialized by json_scanf?

Oh I see.. In that example of yours jscon_scanf() wouldn't be able to turn the json strings into instances a and b, for that purpose use jscon_parse(). The library equivalent is as follows:

jscon_item_t *a, *b;
a = jscon_parse(str1);
b = jscon_parse(str2);

jscon_item_t *item;
item = jscon_get_branch(a, "field_a"); //analogous to a.field_a

jscon_get_type(item); //will return JSCON_BOOLEAN

item = jscon_get_branch(b, "field_a"); //analogous to b.field_a

jscon_get_type(item); //will return JSCON_UNDEFINED

In actuality b.field_a is a NULL pointer since jscon_get_branch couldn't fetch it, and jscon_get_type interprets a NULL pointer as a JSCON_UNDEFINED (from the latest commits).

Now, because your question was regarding jscon_scanf() I'll assume two possible scenarios.

First scenario the json strings are members of a root object:

{ "obj": {  "field_a":  false, "field_b": "this is a string"  }}

{ "obj": {  "field_b": "this is another string"  }}

Then you can fetch obj using jscon_scanf() with the %ji specifier, and we would get the same results as the example shown above.

Second scenario you want to fetch field_a directly from the string, in which case we will stick with your original string.

In that case we will use %b specifiers and parse field_a value directly to bool variable.

bool boolean;
jscon_scanf(str1, "%b[field_a]", &boolean);
//boolean is the same as the json string

jscon_scanf(str2, "%b[field_a]", &boolean);
//boolean is not updated

/* the above is not optimal if you
wish to check if the operation was a
success, but there is a second
alternative using %ji as specifier,
though it can be sometimes overkill */

jscon_item_t *item;

jscon_scanf(str1, "%ji[field_a]", &item);

jscon_get_type(item); //returns JSCON_BOOLEAN
jscon_get_boolean(item); //returns same value as the json string

jscon_scanf(str2, "%ji[field_a]", &item);

jscon_get_type(item); //returns JSCON_UNDEFINED
jscon_get_boolean(item); //returns false
stensalweb commented 3 years ago

Oh I see.. In that example of yours jscon_scanf() wouldn't be able to turn the json strings into instances a and b, for that purpose use jscon_parse().

The appealing of using jscon_scanf to de-serialize json strings to C struct instances is that json's fields become C struct's fields, which are first class citizen in C. IDEs can do auto completion for C struct's fields. Using jscon_parse cannot achieve the same goal.

For the two string example:

str1

 {  "field_a":  false, "field_b": "this is a string"  }

I can do this jscon_scanf(str1, "%b[field_a]%s[field_b]", &obj1->field_a, obj1->field_b), can I?

str2

 {  "field_b": "this is a string"  }

I can do this jscon_scanf(str2, "%b[field_a]%s[field_b]", &obj2->field_a, obj2->field_b), can I?

If both are correct, what will the value of obj2->field_a be? How to improve jscon_scanf to encode the information that field_a is absent in str2 in obj2?

lcsmuller commented 3 years ago

Oh I see.. In that example of yours jscon_scanf() wouldn't be able to turn the json strings into instances a and b, for that purpose use jscon_parse().

The appealing of using jscon_scanf is to de-serialize json strings to C struct instances is that json's fields become C struct's fields, which are first class citizen in C. IDEs can do auto completion for C struct's fields. Using jscon_parse cannot achieve the same goal.

For the two string example:

str1


 {  "field_a":  false, "field_b": "this is a string"  }

I can do this jscon_scanf(str1, "%b[field_a]%s[field_b]", &obj1->field_a, obj1->field_b), can I?

str2


 {  "field_b": "this is a string"  }

I can do this jscon_scanf(str2, "%b[field_a]%s[field_b]", &obj2->field_a, obj2->field_b), can I?

If both are correct, what will the value of obj2->field_a be?

How to improve jscon_scanf to encode the information that field_a is absent in str2 in obj2?

Ah I get it now, my apologies. That appeal was exactly my motivation on creating that function in the first place! :)

Both examples will work, and you are right that its currently lacking a way to provide the user with useful diagnostics regarding absences, simply ignoring the field isn't a solution and can lead to confusion, at best. Perhaps a new and optional parameter for jscon_scanf() that receives a char** and fills it with missing keys might do the trick? Though I'm not a fan of that solution.

Anyhow, I'm definitely adding this as a top priority so I really appreciate you bringing this up.

stensalweb commented 3 years ago

Two missing features can unlock the full potential of jscon_scanf:

  1. handle nested objects
  2. easy to detect the absence of key.

I will poke around for the first one.

stensalweb commented 3 years ago

A proposal to encode missing keys/properties/fields

struct a {
    int key_i;
    float key_f;
    char * key_str;
    void * missing[3]; // 3 is the number of fields. 
};

struct a  a = {0};
a.key_str = malloc(10);
str = "{ key_f: 1.0, key_str: \"hello\" };"
jscon_scanf(str,  a.missing, "%d[key_i]%f[key_f]%s[key_str]", &a.key_i, &a.key_f, a.key_str);

inside jscon_scanf, the following is done to encode the missing key_i.

a.missing[0]  = &a.key_i;

This proposed solution does not introduce extra dynamic memory allocation, the assumption is that the number of missing keys is less than the number of existing keys. It's similar to what you proposed. But instead of encoding missing keys, we capture the address of a field that is not initialized in jscon_scanf.

stensalweb commented 3 years ago
jscon_item_t *item;
char *string;
bool boolean;
int nested_number;

char buffer[] = "{\"alpha\":[1,2,3,4], \"beta\":\"This is a string.", \"gamma\":true, \"omega\":{\"nest\":1}}";
/* order of arguments doesn't have to be the same as the json string */
jscon_scanf(buffer, "%s[beta] %b[gamma] %ji[alpha] %d[omega][nest]", string, &boolean, &item, &number);

Given the above example, it's better the nest object is de-serialized as well, the following example will be more useful.

jscon_item_t *item;
char *string;
bool boolean;
int nested_number;
struct omega o;

struct {
   char * fmt_str;
   int number_of_args
   char * addresses;
} of;

of.fmt_str = "%s[nest]";
of.number_of_args = 1;
of.address[0] = &o.nest;

char buffer[] = "{\"alpha\":[1,2,3,4], \"beta\":\"This is a string.", \"gamma\":true, \"omega\":{\"nest\":1}}";
/* order of arguments doesn't have to be the same as the json string */
jscon_scanf(buffer, "%s[beta] %b[gamma] %ji[alpha] %o[omega]", string, &boolean, &item, &of);

Inside jscon_scanf,
%o[ometa] will invoke   
     jscan_scanf(start_buffer_of_omega, of->fmt_str, address[0], address[1], ...);   // this has to be done with va_list 
lcsmuller commented 3 years ago

This made me realize another issue. Sometimes the useful information is inside arrays, such as:

[{"num": 1},{"num": 2},"string"]

Your %o suggestion works for object, but how will we be dealing with parsing array members without parsing it to a jscon_item_t?

lcsmuller commented 3 years ago

jscon_item_t *item;

char *string;

bool boolean;

int nested_number;

struct omega o;

struct {

   char * fmt_str;

   int number_of_args

   char * addresses;

} of;

of.fmt_str = "%s[nest]";

of.number_of_args = 1;

of.address[0] = &o.nest;

char buffer[] = "{\"alpha\":[1,2,3,4], \"beta\":\"This is a string.", \"gamma\":true, \"omega\":{\"nest\":1}}";

/* order of arguments doesn't have to be the same as the json string */

jscon_scanf(buffer, "%s[beta] %b[gamma] %ji[alpha] %o[omega]", string, &boolean, &item, &of);

Inside jscon_scanf,

%o[ometa] will invoke   

     jscan_scanf(start_buffer_of_omega, of->fmt_str, address[0], address[1], ...);   // this has to be done with va_list 

When dealing with deeper nesting we will be doing 'N nests deep' %o invokes until the desired element is found? How would we fetch 'a' and 'b' from the following example?

{
    "alpha": {
        "beta": {
            "a": 1,
            "gamma": {
                "b": 2
            }
        }
    }
}