taocpp / json

C++ header-only JSON library
MIT License
591 stars 86 forks source link

Customize destination type for double's #44

Closed whiterabbit963 closed 5 years ago

whiterabbit963 commented 5 years ago

I would like to be able to identify a floating point number and instead of converting it to a double, I would like to convert it to a custom type, eg.

namespace mp = boost::multiprecision;
mp::number<mp::cpp_dec_float_50>

At first I attempted to use type traits but these solutions all convert the string to a double first. The following line is what I imagine would need to be abstracted. https://github.com/taocpp/json/blob/f837f0a7b718811377685ecb4f8b4f21b3e2e8c5/include/tao/json/internal/number_state.hpp#L77 Normally, I would make this a lambda, have a default, pass in my custom callback, and call it a day, but I do not know if that plays well with all of the templates flying around.

I may try to hack something together, but I wanted to know if you had any plans to support this kind of feature.

Thanks!

ColinH commented 5 years ago

The ability to change the numeric data type goes beyond our current design space, so unfortunately there's no way to do this right now. We will think about it and see what we can do, at the least we can help with your hacking together a custom solution.

(We are trying very hard to feature freeze this library for a 1.0 release, and changing data types was something that we wanted to re-visit later for version 2.0.)

ColinH commented 5 years ago

Can you tell us a bit more about your use case and its data flow, do you need the multi-precision numbers for the Events, do you want to store them in custom data types, or do you also need them as alternative numeric data type in tao::json::value?

whiterabbit963 commented 5 years ago

My current solution parses a file stream using jaxn::from_stream into a json::basic_value. I then access the fields I want from the resulting json object. I need the string value to be converted directly to my mp type instead of a double.

Everything you mentioned would be useful to me, but the following pseudo-code describes what would require the fewest number of changes to my workflow.

tao::json::basic_value json = tao::json::jaxn::from_stream(s);
tao::json::basic_value &item = json.at("num_tag");
if(!item.is_number())
    return "error";
auto number = item.as<mp_type>();

I did template my mp_type as a trait float_trait and it worked really well, except for the unwanted string -> double conversion before being converted to my mp_type.

If there was a way to pass a trait to the call in number_state.hpp to decode the string as a mp_type and set it using an additional numeric type handler eg. Consumer::number(const mp_type v), I think that would work well.

ColinH commented 5 years ago

As mentioned, there is currently no way to change the handling of numeric data types, the Events interface, the Value class, and all the parsers are built around the int64/uint64/double trio.

The questions regarding your use case are because with taocpp/json you can both parse JSON into a Value, and then extract the things that you need from it, or you can directly parse JSON into a custom data structure.

Adding support for different numeric types might be relatively easy, whereas making the types user configurable for the Value class might be beyond what we can do for 1.0.

To be clear, the two approaches I'm talking about are:

struct foo
{
   int i;
   std::string s;
};

const auto j = tao::json::parse_file( "file.json" );
const foo f = j.as< foo >();  // With appropriate traits for foo.
// Without traits for foo:
foo f;
f.i = j.as< int >( "i" );
f.s = j.as< std::string >( "s" );
// With appropriate traits and no intermediate tao::json::value,
// i.e. parsing directly from JSON into the foo instance:
const auto f = tao::json::consume_file< foo >( "file.json" );
whiterabbit963 commented 5 years ago

Ok, let me see if I am understanding you correctly. With the approach I have taken, by extracting the data from a basic_value type, I am at the mercy of the built-in type detection. If I specify via traits and convert directly to my own data structure I can specify that I want a json double type to be parsed as a string by traiting it as a string and let the mp_type itself do the string parsing conversion via it's built-in constructor.

mp_type(const string &v);

// { "num" : 0.75 }
template<> struct traits< mp_type > : traits< std::string > {};
struct foo
{
    mp_type num; // <-- string("0.75")
};

I will have to play around with it some more as I don't think I quite have it right. I have not used traits much yet, but it looks I can accomplish what I want and that is exciting! I will check back later today after I get a chance to work with it some more.

ColinH commented 5 years ago

The "converting to your own data structure" can be done via traits< my_type >::as() which converts from a tao::json::basic_value< traits >, or via traits< my_type >::consume() which parses directly from the JSON (or CBOR or UBJSON etc.) representation using the appropriate parts_parser.

It should be relatively easy to write traits< mp_type > that use the tao::pegtl::memory_input from the parts_parser for a consume() function, whereas everything else is constrained by the Events interface and/or the data types that we use for numeric data.

Unless of course we add another function like void number( std::string_view ) to the Events interface, but we're trying to avoid this for now.

whiterabbit963 commented 5 years ago

Yeah, I was digging into the parts_parser and the only thing that was making sense was to add some grammar to parse a double and instead of sending it to the number_state to redirect it to my own custom type. I also ran into a couple of other issues.

1) The current double-grammar parses the data into a format that is easily digested by your current string -> double converter instead of a string_view from the original raw buffer. 2) I was inheriting parts_parser to add a new parsing type for use in my traits::consume() and m_input is private.

Is it possible to modify the grammar rules and controls for a double without modifying the parts_parser itself?

This has been an interesting problem so far.

ColinH commented 5 years ago

Yes, it's always interesting when you go beyond the original design space...

First, I just made the input protected in the parts_parser classes so that you can at least get at it.

Now you probably don't need to change the actual grammar, just the actions, since the grammar only decides on what to match, and the actions are what is tightly coupled to the double parser.

However the grammar in taocpp/json is highly optimised to the point of not being very flexible, I think the fastest way to get this to work is something like this:

Inherit from the JSON parts_parser and add a new member function mp_type number_mp() that uses the number grammar rule from the PEGTL JSON grammar that you can find in the PEGTL embedded in taocpp/json in <tao/json/external/pegtl/contrib/json.hpp> together with a PEGTL-style action (rule and action passed to tao::json_pegtl::parse()) that takes the matched part of the input, gives it to whatever function you have that can create a mp_type from a string or string view, stores it in a local PEGTL-style state (like the other number functions), and returns the result.

That's a very hands-waving high-level description, I can go into more details if needed.

ColinH commented 5 years ago

I forgot, if you then implement traits< mp_type >::consume() to use the new parts_parser::number_mp() method, you can use the "binding" facilities to generate the traits for a struct that contains mp_type members.

(The generated traits will of course only support consume(), not as(), since the traits for mp_type only support consume() and not as().)

whiterabbit963 commented 5 years ago

Got it working! Having the json grammar separate from the contrib directory was very helpful. Once I figured out that I had to template the action with the pegtl_json::json::number rule, everything I had so far fell into place.

I noticed that the jaxn format does not have a parts_parser. Is this in progress, were you not going to do it, or am I misunderstanding how to use it?

I started templating a custom type in the parser, and it didn't seem too difficult to accomplish. Not sure if it would save much in time for the end-user though. Considering that the alternative cases like mine should be rare, it is probably fine the way it is. I would rather see the parts_parser for the jaxn format anyway.

ColinH commented 5 years ago

Good to hear, hope it wasn't too much work to learn how to use the PEGTL, too!

The parts parser for JAXN is still missing, it's high on my TODO-list and I hope to tackle it soon...

whiterabbit963 commented 5 years ago

Yeah, it is a clever use of the template system to create parsing rules. The part that is still a little fuzzy for me is how to know what functions to implement for the rules: apply() apply0() or I think I saw match(). I used apply() because that was what the existing double parser was using to store parsing information in the number_state struct.

Great to hear about the JAXN parser.

Also, duseltronik? The only thing I could find was a youtube video (in norwegian?) of a guy who got into what I think was a cryo-machine and came our years later covered in hair.

ColinH commented 5 years ago

In the PEGTL you have rules, which are part of the actual grammar and determine which input matches which is why they have a match() function, and actions, which are applied when the grammar rule that they are attached to matches which is why they have an apply() or apply0() function.

(There are also two kinds of match()-functions, which we can distinguish with meta-programming; the same is not true for the two kinds of apply()-functions, so they need different names. Also, originally the actions could not influence the matching, now they can, but that's not what we usually do.)

Oh, and the term "Duseltronik" is inspired by the German TV series based on Stanislaw Lem's short stories about Ijon Tichy, Space Pilot and Hero of Cosmos...

I'll close this issue now since it seems to be resolved, but of course feel free to ask any other question, or continue to chat :-)

whiterabbit963 commented 5 years ago

Ok, so the match() functions are attached to rules and the the apply() functions are attached to rules.

What was the inspiration for PEGTL, json type parsing, better regular expression-like parsing, programming language syntax parsing, or something else?

Ah, German of course, I should have noticed that. I'll have to see if I can find a translation (or startup my duolingo lessons again), and check it out. Judging from the tone it seems very tongue-in-cheek and goofy. It feels a little like mel brooks and Spaceballs.

Thanks for all of your help and pointing me in the right direction. It would have taken me much longer to figure this out. Especially since I was using the last beta version and I didn't have the new parts_parser. Last time I tried to update, the windows build was broken and it was a week before d-frey made the windows changes. Glad to see it has stabilized!

d-frey commented 5 years ago

For the Windows build: It might still not be complete, MSVC is giving us quite a few headaches. As we both don't use Windows we appreciate any help. I'll try to have another look soon, but it's hard to fix stuff when I have to guess and wait for the CI cycle to see the results.

For your German lessons: It might not be the best idea to use "Ijon Tichy: Raumpilot" to learn German, as they are using a very twisted and incorrect German to emulate a Russian or Polish accent and to sound more funny. It works if you are German, but please don't speak like that. Like learning English from Yoda, it is. ;)

ColinH commented 5 years ago

The "inspiration" for the PEGTL came in 2007 when I saw Christopher Diggins' "YARD", a C++98 PEG library that used the template-programming-as-domain-specific-language approach for the parsing rules, and variadic template supports coming into g++.

I re-implemented this basic idea from YARD as PEGTL, using the new language features, and with improvements like user-defined actions, flexible input handling, and better debugging facilities, and the project has been going ever since.

whiterabbit963 commented 5 years ago

@d-frey I try to avoid windows as much as possible, and currently for my purposes the latest build is working. But when I get a chance, I will tinker with building your test code.

whiterabbit963 commented 5 years ago

So I got your tests fired up, but I haven't got a clue how to modify the templates to work for msvc. It is complaining about the file tao/json/binding/element.hpp:

template< typename C, typename T, T C::*P >
struct element< P, std::enable_if_t< std::is_member_object_pointer_v< T C::* > > >

Error (active)  E0842   template parameter "T" is not used in or cannot be deduced from the template argument list of class template "tao::json::binding::element<P, std::enable_if_t<std::is_member_object_pointer_v<T C::*>, void>>"  tao-json-test-binding_versions  C:\projects\json_rabbit\include\tao\json\binding\element.hpp    23  

Error (active)  E0842   template parameter "C" is not used in or cannot be deduced from the template argument list of class template "tao::json::binding::element<P, std::enable_if_t<std::is_member_object_pointer_v<T C::*>, void>>"  tao-json-test-binding_versions  C:\projects\json_rabbit\include\tao\json\binding\element.hpp    23  
ColinH commented 5 years ago

Seeing that Clang and GCC both compile this code without issues, it seems that MSVC is, again, a bit behind with their standards support. Fortunately they seem to be trying to catch up, so hopefully they will fix this soon. No idea whether there's a workaround...

whiterabbit963 commented 5 years ago

For exactly the reasons you stated, that is what I am hoping as well. I did some digging on microsoft's website to see what they actually support. About all I could determine were hints that their template support might have some bugs. Fortunately, I am not using any of the parts that fail to compile.