stephenberry / glaze

Extremely fast, in memory, JSON and interface library for modern C++
MIT License
1.13k stars 115 forks source link

Extra/Unknown fields handling (was Any Setter/Getter) #528

Closed anaelorlinski closed 10 months ago

anaelorlinski commented 10 months ago

Hello,

I made a proof of concept for handling unknown keys : https://github.com/anaelorlinski/glaze/tree/json_any This feature exists in Java in the Jackson parser with annotations @JsonAnyGetter and @JsonAnySetter, the idea is to provide the same kind of feature.

Here this is how it works: With this method on your class

void my_set_unkwown_key(const glz::sv& key, const glz::raw_json& value) {
     // whatever, for example 
     // extra[key] = value;
  };

add to the meta class static constexpr auto json_any_setter = glz::json_any_setter<&T::my_set_unkwown_key>;

Same idea for json_any_getter to write back the keys.

In the proof of concept glz::opts{.error_on_unknown_keys = false} must be provided.

Let me know if this sounds a good addition, I can fix the todos and propose a PR

justend29 commented 10 months ago

Can you make a draft pull request so we can compare the changes, please? It doesn't have to be ready to merge - just a draft to better consider how it would integrate.

justend29 commented 10 months ago

Also, are you aware of the existing glz::merge?

stephenberry commented 10 months ago

This is a really smart idea, and I appreciate how it is just another compile time field of glz::meta. I do have some thoughts.

my_set_unkwown_key and my_get_unkwown_keys feel like an unnecessary indirections. I think we should just be able to remove them and register the extra map. See updated code below:

struct unknown_fields
{
  std::string a;
  std::string missing;
  std::string end;
  std::map<glz::sv, glz::raw_json> extra;
};

template <>
struct glz::meta<unknown_fields>
{
   using T = unknown_fields;
   static constexpr auto value = object(
      "a", &T::a,
      "missing", &T::missing,
      "end", &T::end
   );
   static constexpr auto json_any_setter{&T::extra};
   static constexpr auto json_any_getter{&T::extra};
};

We can also add support for lambda functions, etc.

Because this if for handling unknown fields, I feel like it is useful across the binary BEVE and JSON. So, I would change the name of the fields to unknown_read and unknown_write.

The updated glz::meta would look like:

template <>
struct glz::meta<unknown_fields>
{
   using T = unknown_fields;
   static constexpr auto value = object(
      "a", &T::a,
      "missing", &T::missing,
      "end", &T::end
   );
   static constexpr auto unknown_write{&T::extra};
   static constexpr auto unknown_read{&T::extra};
};

Because we are dealing with keys, this obviously implies that unknown_write and unknown_read need to provide object types (often known as maps in C++).

Currently your code parses into a glz::raw_json value and then calls the helper function to parse the value in the extra object. I think we can remove this double parsing, making things more straightforward and faster.

Let's say the user registers a std::map<std::string, std::vector<double>> for extra values. We can just grab the entry and parse directly:

// ...assume we are in an object parsing context and we encounter and unknown key
// first we parse the key
// use the key in the "extra" map, and directly parse into that value:
read<json>::op<Opts>(extra[key], ctx, it, end);

This way we don't end up parsing the array for std::vector<double> twice.

anaelorlinski commented 10 months ago

Of course my_set_unkwown_key and my_get_unkwown_key can look like straightforward indirection but also can be used to implement custom user code like logging. Registering the extra map directly should be possible too, as well as using lambdas. This is why my code has proof of concept quality :)

The main purpose I expect to work is the pattern decode/modify/encode where only some fields are known and all the others kept as is, hence saving raw_json. Any further enhancement could be useful in other usecases.

stephenberry commented 10 months ago

Totally! You should be able to register custom functions as well, I just want to make sure we can do the simple things without performance overhead. So, work on removing allocating the glz::raw_json in the default case. The user can always make a map that contains glz::raw_json if they need to, but most of the time using this is a design flaw.

jbbjarnason commented 10 months ago

Just a thought, do we need a custom setter/getter for such a feature? Is one set of setter/getter enough for one object?

What I am thinking is moving the logic to the glz::object,

struct unknown_fields
{
  std::string a;
  std::string missing;
  std::string end;
  std::map<glz::sv, glz::raw_json> extra;
};

template <>
struct glz::meta<unknown_fields>
{
   using T = unknown_fields;
   static constexpr auto value = object(
      "a", &T::a,
      "missing", &T::missing,
      "end", &T::end,
      glz::merge{ &T::extra }
   );
};

This would also solve other issues, like the one in json schema file.

justend29 commented 10 months ago

@jbbjarnason If that supports both reading and writing, I like your suggestion. It fulfills the original requirement in a simpler manner without more members.

anaelorlinski commented 10 months ago

Implemented in #529