Accounting for arbitrary precision numerical literals

TheButlah commented 5 years ago

Describe what you want to achieve. I have a config struct that gets initialized via a json file. I want to make sure that any floats in the json file will round trip (convert from string-float-string) and remain identical. To enforce this, I need a way of seeing if the provided JSON-float (which is represented on the computer as a string) exceeds a certain number of digits of precision (in the case of string-float-string roundtrips, this precision is 15 decimal digits on my machine). Once I have a way to read the JSON-float as a string before it gets converted to a C++-float, I can throw a runtime error if the user tries to provide a JSON-float of too high a precision.
Describe what you tried. I can control JSON serialization via std::setprecision(), but I cannot control JSON deserialization. I know that there is a SAX interface that looks like I might be able to get an event hook on when the JSON-float gets parsed into a string before conversion to a C++-float, but I don't know how to use it as the SAX documentation said that it doesn't handle the actual serialization and deserialization (I also don't know what SAX is).
Describe which system (OS, compiler) you are using. MacOS, Gcc-8
Describe which version of the library you are using (release version, develop branch). master, v3.7.3

P.S: Very new to C++, your library is making me hate the language a little less :)

nlohmann commented 5 years ago

The SAX parser could help. A SAX parser does not create an in-memory representation of the parsed input, but only calls certain functions each time a parse event is encountered. The interface is documented here: https://nlohmann.github.io/json/structnlohmann_1_1json__sax.html

For you, function number_float() could be interesting. It is called every time the parser read a floating-point number. It is then called with a numeric value (usually a double) and the original string from the input. So your usecase should be realizable here.

A simple implementation of a SAX parser can be found here: https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/input/json_sax.hpp#L631. It is the code used for json::accept. It returns true for all values, and false in case of an error. Returning false means parsing will be stopped immediately. The same file also contains code of the actual parser called in json::parse.

Let me know if you need further assistance.

TheButlah commented 5 years ago

Hi, thanks for your response!

Looking through the SAX api and example in json_sax, its clear to me how to implement the functions defined by the sax api to determine if a float round trips. However, its not clear to me how to actually use the json_sax class in order to construct a json object - there is a lot of logic that goes into that, and I'm not sure how to easily do it. It seems like I would have to rewrite most of basic_json, but surely I'm wrong on that?

nlohmann commented 5 years ago

You can copy/paste the json_sax_dom_parser class (https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/input/json_sax.hpp#L145). It "translates" SAX events to nested constructor calls. All you would need to do is add the desired logic to number_float. A complete example how to use a user-defined SAX event processor is shown in https://nlohmann.github.io/json/classnlohmann_1_1basic__json_a8a3dd150c2d1f0df3502d937de0871db.html#a8a3dd150c2d1f0df3502d937de0871db.

TheButlah commented 5 years ago

aha! this looks promising, thanks so much :))

As a newbie, I would never be able to figure this out on my own. Is there a way that documentation for this could be added? I'd offer to do it but I do not believe I am qualified or should be trusted lol.

Specifically for being able to keep the default json parsing mechanism, but being able to "override" the default functionality

nlohmann commented 5 years ago

Is https://nlohmann.github.io/json/structnlohmann_1_1json__sax.html and https://nlohmann.github.io/json/classnlohmann_1_1basic__json_a8a3dd150c2d1f0df3502d937de0871db.html#a8a3dd150c2d1f0df3502d937de0871db insufficient?

TheButlah commented 5 years ago

The documentation for json_sax is sufficient for understanding how to implement custom event listeners, and the documentation for sax_parse is sufficient for understanding how to call a json_sax, but its not clear how to do it in conjunction with still creating a json type. The proposed solution of duplicating (or can i extend? unsure) json_sax_dom_callback_parser in order to get the same functionality that the json type does under the hood wasn't clear to me.

Maybe its because I'm new to c++, or that I'm unfamiliar with this library, but giving an example of using json_sax_dom_callback_parser to keep the same default behavior of json yet change a small thing would be good. Maybe in the section in the README on the SAX api?

The rest of the documentation was really intuitive and easy to understand, but this seems like the sort of thing one has to dig through code or ask the author to know how to do without documentation for this

nlohmann commented 5 years ago

This is a rather specific usecase, and I would be happy for any proposal (PRs welcome) how to extend https://github.com/nlohmann/json#sax-interface.

TheButlah commented 5 years ago

I would be happy to think about an alternative API or a canonical example I could come up with. I feel more comfortable contributing to your library because I believe you have testing infrastructure in place to prevent bugs introduced from C++ beginners like me (only somewhat joking).

I'll think on the matter more once I implement a solution to my current use case.

TheButlah commented 5 years ago

Actually, I have a better idea than trying to revise the SAX api or case-specific documentation. My use case can more generally be stated as follows:

When trying to get (or set!) the value of a numerical json field, instead of getting the value as a particular c++ type, such as with auto value = j.at("key").get<double>(), I want to get the original raw string representation of the value, before being parsed into a concrete c++ type.

Why would someone ~want~ need this? Well, the numbers that JSON can represent do not actually correspond to the primitive datatypes in c++. In JSON, its perfectly valid for a value to be 123.0000000000123456789 or -12345678912345789123456789, both of which cannot be represented losslessly in c++ primitives. Effectively, the numerical format in JSON is infinite (rational) precision, as all numbers are encoded as strings.

This causes issues if there isn't a way to unable users to address this discrepancy when they need to. In my case, it manifests itself as me wanting to read the raw string literal of the number to ensure that the user can't input a number of such high precision that it won't round-trip to a double losslessly. In #1421 , it was that the user cares about maintaining the representation of the original floating point number without tacking on any extra zeros. Looking through the issue history of the repo, there were several other issues on round tripping floats, although I don't know if the proposed fixes applied to all the use cases.

Not accounting for this discrepancy between c++ primitives and JSON primitives makes this library unable to allow users to handle lossless serialization and de-serialization of the subset of valid JSON files that have numerical literals of a higher precision than that of c++ primitives. I think this is not a niche use case but rather functionality that users would appreciate. Think about the vast quantities of people that use JSON for scientific computing, or financial data, or (in my case) just want a way to sanitize user inputted floats so that they will serialize back to the same decimal representation.

The good news is that there is probably an easy API fix for all of this (and its not SAX :P ) My proposal has the following goals in mind:

Any API changes can't be breaking unless we are willing to bump the major version number.
Any new API features should be idiomatic with respect to the current library design.
The new API features should work in the way that users would intuitively expect. In particular, serialization should obey the same ostringstream rules that floats do like std::setprecision and std::fixed.

I think the following usage fulfills these requirements, inspired by https://github.com/nlohmann/json/issues/1421#issue-397413141:

json j = R"({
  "too_precise_for_double": 123456789.123456789
})"_json;

// Accessing JSON fields
json::numerical d_literal = j.at("too_precise_for_double").get<json::numerical>();
double d_double = j.at("too_precise_for_double").get<double>();

// prints `123456789.123456789` (note no string quotes, because its not a string)
std::cout << std::fixed << d_literal << std::endl;
// prints `123456789.123457` (note truncation due to limited precision double)
std::cout << std::fixed << d_double << std::endl;
try {
  // Throws an exception, because the value isn't a string
  string d_string = j.at("too_precise_for_double").get<string>();
} catch (...) {}

// Setting JSON fields
json j2;
j2['new_numerical_literal'] = json::numerical("987654321.987654321");
// prints `{"new_numerical_literal":987654321.987654321}`
cout << std::fixed << j2 << endl;

json::numerical would essentially be internally a std::string, but provides a difference in meaning because trying to get a numerical as a string currently throws an error and should continue to do so, and numerical literals should not be printed out with quotes like a string would be.

Would this be something fairly feasible for me as a c++ novice to implement? Keep in mind that even looking through the codebase is very overwhelming for me, neverless trying to modify it.

TheButlah commented 5 years ago

Just checking in to see what you think about this proposal. Is this a good solution? Want to check in with you before I go and try to get a pull request working

nlohmann commented 5 years ago

So you would store an additional string with each number?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

fbrausse commented 4 years ago

Hi, I'm also encountering this problem of parsing the numeric value when it is not representable as double or in any fixed-length representation that can be plugged into the NumberFloatType template parameter for basic_json. Precisely, the problem is that the usage of NumberFloatType is inside a union, which requires it to be a trivially destructible type (hence, the fixed-length representation).

If the original string cannot be stored along the numeric interpretation, is there any way to use e.g. GMP's mpz_class for integers or mpq_class for NumberFloatType directly? (The fact that constructing mpq_class objects directly from JSON's number format strings like "1.23" does not work can be worked around e.g. by recording the position p of the decimal ., giving the string without it to mpq_class and dividing the result by 10^p.)

t-b commented 4 years ago

@fbrausse Can you work around the trivially destructable issue with wrapping your mpz*/mpq* class?

But in general I think there should be a way of retrieving the string representation at parse time when fetching a value.

fbrausse commented 4 years ago

Hi, if I manage the lifetime somehow myself, probably. At the moment, it is not clear to me what the lifetime of Number*Type objects is. This makes it quite hard to use safely. The "trivially destructible" requirement comes from the use of union. A std::variant would work around that problem nicely, it is C++17, though and might have performance implications. A way around that might be to use std::aligned_storage and explicitly call the Number*Type's destructor (or that of any other type you store in there).

I understand the defaults of long and double from a usability perspective, however, if I understand JSON correctly, numbers are neither required to have finite length nor to be representable by binary floats - they are arbitrarily long decimals with an optional exponent.

Indeed, somehow accessing the string representation in the source would be very helpful; it might also open up the possibility to use different interpretations for the user - e.g. if accuracy is required I could imagine plugging in some decimal float type depending on the use case.

fbrausse commented 4 years ago

A slightly different approach might also work:

Internally store "JSON numbers" as you store "JSON strings", but the json::get<T>() would lookup in a user-specializable trait, e.g., nlohmann::is_number_float<T> (defaulting to std::false_type for anything not float, double or long double) whether get<T> did actually request a floating point number and then only in get<T> construct/parse it from the string.

Do you know whether that would imply an API change or whether it would be an acceptable modification?

nlohmann / json

Accounting for arbitrary precision numerical literals #1849