Closed TheButlah closed 4 years ago
The SAX parser could help. A SAX parser does not create an in-memory representation of the parsed input, but only calls certain functions each time a parse event is encountered. The interface is documented here: https://nlohmann.github.io/json/structnlohmann_1_1json__sax.html
For you, function number_float()
could be interesting. It is called every time the parser read a floating-point number. It is then called with a numeric value (usually a double
) and the original string from the input. So your usecase should be realizable here.
A simple implementation of a SAX parser can be found here: https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/input/json_sax.hpp#L631. It is the code used for json::accept
. It returns true
for all values, and false
in case of an error. Returning false
means parsing will be stopped immediately. The same file also contains code of the actual parser called in json::parse
.
Let me know if you need further assistance.
Hi, thanks for your response!
Looking through the SAX api and example in json_sax, its clear to me how to implement the functions defined by the sax api to determine if a float round trips. However, its not clear to me how to actually use the json_sax class in order to construct a json object - there is a lot of logic that goes into that, and I'm not sure how to easily do it. It seems like I would have to rewrite most of basic_json, but surely I'm wrong on that?
You can copy/paste the json_sax_dom_parser
class (https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/input/json_sax.hpp#L145). It "translates" SAX events to nested constructor calls. All you would need to do is add the desired logic to number_float
. A complete example how to use a user-defined SAX event processor is shown in https://nlohmann.github.io/json/classnlohmann_1_1basic__json_a8a3dd150c2d1f0df3502d937de0871db.html#a8a3dd150c2d1f0df3502d937de0871db.
aha! this looks promising, thanks so much :))
As a newbie, I would never be able to figure this out on my own. Is there a way that documentation for this could be added? I'd offer to do it but I do not believe I am qualified or should be trusted lol.
Specifically for being able to keep the default json parsing mechanism, but being able to "override" the default functionality
The documentation for json_sax
is sufficient for understanding how to implement custom event listeners, and the documentation for sax_parse
is sufficient for understanding how to call a json_sax
, but its not clear how to do it in conjunction with still creating a json type. The proposed solution of duplicating (or can i extend? unsure) json_sax_dom_callback_parser
in order to get the same functionality that the json type does under the hood wasn't clear to me.
Maybe its because I'm new to c++, or that I'm unfamiliar with this library, but giving an example of using json_sax_dom_callback_parser
to keep the same default behavior of json
yet change a small thing would be good. Maybe in the section in the README on the SAX api?
The rest of the documentation was really intuitive and easy to understand, but this seems like the sort of thing one has to dig through code or ask the author to know how to do without documentation for this
This is a rather specific usecase, and I would be happy for any proposal (PRs welcome) how to extend https://github.com/nlohmann/json#sax-interface.
I would be happy to think about an alternative API or a canonical example I could come up with. I feel more comfortable contributing to your library because I believe you have testing infrastructure in place to prevent bugs introduced from C++ beginners like me (only somewhat joking).
I'll think on the matter more once I implement a solution to my current use case.
Actually, I have a better idea than trying to revise the SAX api or case-specific documentation. My use case can more generally be stated as follows:
When trying to get (or set!) the value of a numerical json field, instead of getting the value as a particular c++ type, such as with auto value = j.at("key").get<double>()
, I want to get the original raw string representation of the value, before being parsed into a concrete c++ type.
Why would someone ~want~ need this? Well, the numbers that JSON can represent do not actually correspond to the primitive datatypes in c++. In JSON, its perfectly valid for a value to be 123.0000000000123456789
or -12345678912345789123456789
, both of which cannot be represented losslessly in c++ primitives. Effectively, the numerical format in JSON is infinite (rational) precision, as all numbers are encoded as strings.
This causes issues if there isn't a way to unable users to address this discrepancy when they need to. In my case, it manifests itself as me wanting to read the raw string literal of the number to ensure that the user can't input a number of such high precision that it won't round-trip to a double losslessly. In #1421 , it was that the user cares about maintaining the representation of the original floating point number without tacking on any extra zeros. Looking through the issue history of the repo, there were several other issues on round tripping floats, although I don't know if the proposed fixes applied to all the use cases.
Not accounting for this discrepancy between c++ primitives and JSON primitives makes this library unable to allow users to handle lossless serialization and de-serialization of the subset of valid JSON files that have numerical literals of a higher precision than that of c++ primitives. I think this is not a niche use case but rather functionality that users would appreciate. Think about the vast quantities of people that use JSON for scientific computing, or financial data, or (in my case) just want a way to sanitize user inputted floats so that they will serialize back to the same decimal representation.
The good news is that there is probably an easy API fix for all of this (and its not SAX :P ) My proposal has the following goals in mind:
std::setprecision
and std::fixed
.I think the following usage fulfills these requirements, inspired by https://github.com/nlohmann/json/issues/1421#issue-397413141:
json j = R"({
"too_precise_for_double": 123456789.123456789
})"_json;
// Accessing JSON fields
json::numerical d_literal = j.at("too_precise_for_double").get<json::numerical>();
double d_double = j.at("too_precise_for_double").get<double>();
// prints `123456789.123456789` (note no string quotes, because its not a string)
std::cout << std::fixed << d_literal << std::endl;
// prints `123456789.123457` (note truncation due to limited precision double)
std::cout << std::fixed << d_double << std::endl;
try {
// Throws an exception, because the value isn't a string
string d_string = j.at("too_precise_for_double").get<string>();
} catch (...) {}
// Setting JSON fields
json j2;
j2['new_numerical_literal'] = json::numerical("987654321.987654321");
// prints `{"new_numerical_literal":987654321.987654321}`
cout << std::fixed << j2 << endl;
json::numerical
would essentially be internally a std::string
, but provides a difference in meaning because trying to get a numerical as a string currently throws an error and should continue to do so, and numerical literals should not be printed out with quotes like a string would be.
Would this be something fairly feasible for me as a c++ novice to implement? Keep in mind that even looking through the codebase is very overwhelming for me, neverless trying to modify it.
Just checking in to see what you think about this proposal. Is this a good solution? Want to check in with you before I go and try to get a pull request working
So you would store an additional string with each number?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, I'm also encountering this problem of parsing the numeric value when it is not representable as double
or in any fixed-length representation that can be plugged into the NumberFloatType
template parameter for basic_json
. Precisely, the problem is that the usage of NumberFloatType
is inside a union
, which requires it to be a trivially destructible type (hence, the fixed-length representation).
If the original string cannot be stored along the numeric interpretation, is there any way to use e.g. GMP's mpz_class
for integers or mpq_class
for NumberFloatType
directly? (The fact that constructing mpq_class
objects directly from JSON's number format strings like "1.23"
does not work can be worked around e.g. by recording the position p of the decimal .
, giving the string without it to mpq_class
and dividing the result by 10^p.)
@fbrausse Can you work around the trivially destructable issue with wrapping your mpz*/mpq*
class?
But in general I think there should be a way of retrieving the string representation at parse time when fetching a value.
Hi, if I manage the lifetime somehow myself, probably. At the moment, it is not clear to me what the lifetime of Number*Type
objects is. This makes it quite hard to use safely. The "trivially destructible" requirement comes from the use of union
. A std::variant
would work around that problem nicely, it is C++17, though and might have performance implications. A way around that might be to use std::aligned_storage
and explicitly call the Number*Type
's destructor (or that of any other type you store in there).
I understand the defaults of long
and double
from a usability perspective, however, if I understand JSON correctly, numbers are neither required to have finite length nor to be representable by binary floats - they are arbitrarily long decimals with an optional exponent.
Indeed, somehow accessing the string representation in the source would be very helpful; it might also open up the possibility to use different interpretations for the user - e.g. if accuracy is required I could imagine plugging in some decimal float type depending on the use case.
A slightly different approach might also work:
Internally store "JSON numbers" as you store "JSON strings", but the json::get<T>()
would lookup in a user-specializable trait, e.g., nlohmann::is_number_float<T>
(defaulting to std::false_type
for anything not float
, double
or long double
) whether get<T>
did actually request a floating point number and then only in get<T>
construct/parse it from the string.
Do you know whether that would imply an API change or whether it would be an acceptable modification?
Describe what you want to achieve. I have a config struct that gets initialized via a json file. I want to make sure that any floats in the json file will round trip (convert from string-float-string) and remain identical. To enforce this, I need a way of seeing if the provided JSON-float (which is represented on the computer as a string) exceeds a certain number of digits of precision (in the case of string-float-string roundtrips, this precision is 15 decimal digits on my machine). Once I have a way to read the JSON-float as a string before it gets converted to a C++-float, I can throw a runtime error if the user tries to provide a JSON-float of too high a precision.
Describe what you tried. I can control JSON serialization via
std::setprecision()
, but I cannot control JSON deserialization. I know that there is a SAX interface that looks like I might be able to get an event hook on when the JSON-float gets parsed into a string before conversion to a C++-float, but I don't know how to use it as the SAX documentation said that it doesn't handle the actual serialization and deserialization (I also don't know what SAX is).Describe which system (OS, compiler) you are using. MacOS, Gcc-8
Describe which version of the library you are using (release version, develop branch). master, v3.7.3
P.S: Very new to C++, your library is making me hate the language a little less :)