serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.91k stars 562 forks source link

Parser cannot read arbitrary precision numbers #18

Closed apoelstra closed 6 years ago

apoelstra commented 9 years ago

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf specifies in the second paragraph of the introduction that JSON is agnostic about numbers, and simply represents them as a series of digits.

However, serde-json parses anything with a decimal point as a Rust f64, which causes numbers to be read incorrectly. There is no way to avoid this because this behaviour is chosen as soon as a decimal point is encountered. This makes it impossible to use serde-json to interoperate with financial software using JSON.

JohnHeitmann commented 8 years ago

It'd be nice for both ergonomics and correctness if serde supported this.

But... how common are protocols that rely on this behavior? The two money protocols I'm familiar with, Stripe and Stockfighter, both use integer cents. I believe this is to avoid issues with implementations like Javascript's that are more hopelessly entangled with lost precision.

I'm trying to get a feel for priority, not arguing against the request.

apoelstra commented 8 years ago

@JohnHeitmann Bitcoin Core, and things with API-compatibility with it, uses these numbers for monetary values. This is an historical accident but one that can't be changed now without breaking a lot of applications.

This is the only application I'm aware of, but it's a pretty big one.

nubis commented 8 years ago

Hi, I'm going to try and send you a pull request dealing with this. You'll find the lost precision to be an issue with almost every bitcoin exchange API and Forex api's as well. Some even go the way of serializing decimals as JSON strings so it's more evident for the client/consumer.

And for most other cases, even when I don't know about the precision of the 0.00022 value being sent from JS, I don't want my rust to make it worse giving me a 0.00029999...

I'm familiar with a breadth of Ruby JSON libraries and the community seems to have agreed upon deserialising JSON floats as decimals by default, with an option to reverting to Floats.

Rust has a decimal crate since december 2015, looks very complete, and seems to be using num under the hood so they probably don't reinvent much. I'll try using that. https://crates.io/crates/decimal

Anyways just to let you know this is being looked at :)

dtolnay commented 8 years ago

@nubis Okay but once you parse it, what do you do with the result? The Visitor trait will require you to convert it to f64 in order to do anything.

You are going to need to fix https://github.com/serde-rs/serde/issues/185 before you fix this.

nubis commented 8 years ago

hehe, I was coming to that same conclusion by the time you wrote that.

nubis commented 8 years ago

Hi people, just to share some progress on this, implementing decimal. I've forked both serde https://github.com/nubis/serde and serde_json https://github.com/nubis/json

It took me a few nights to make sense of all the codebase and what goes where, but I finally got decimal deserialization to work.

I'm going to focus now on cleaning everything up, so I'm still a long way to go.

While working on it, and just to make it easier on myself I made it so that all numbers are initially parsed as decimals. As the decimal module has it's own internal parser it lifts some weight from serde which essentially just parses any '0'...'9' or '.'

I do think it's a good idea to start parsing as if it was an integer then scale up to a decimal once we're sure we need the precision, so I'm going to revert to something similar to what you had before, but using decimals instead of f64.

Then, I also think conversions between d128 and f64 are more lossy than they should so I'm going to check that too.

I'd really like some feedback if any of you can give it, I think d128 could be more integrated like any other primitive and its implementations could be using the existing macros.

laktak commented 8 years ago

just fyi - str.parse::<f64>() does a much better job than the current implementation and could be used as a stopgap measure. I separated it from the format check in Hjson.

dtolnay commented 8 years ago

@laktak how did you determine that str.parse::<f64>() does a better job? Are there some inputs on which it is more accurate? What are they?

I recently rewrote float parsing to be 40% faster and much more accurate in #110. Does that resolve the difference?

laktak commented 8 years ago

Sure, here's a simple test with the current versions:

    let input = r#"{
        "x1": -9876.543210,
        "x2": 0.14,
        "x3": 0.1401,
        "x4": 0.14004,
        "x5": 0.14005
    }"#;

    let json: serde_json::Value = serde_json::from_str(&input).unwrap();
    println!("{}", serde_json::to_string_pretty(&json).unwrap());

    let hjson: serde_hjson::Value = serde_hjson::from_str(&input).unwrap();
    println!("{:?}", hjson);
{
  "x1": -9876.543210000002,
  "x2": 0.14,
  "x3": 0.1401,
  "x4": 0.14004000000000003,
  "x5": 0.14005
}
{
  x1: -9876.54321
  x2: 0.14
  x3: 0.1401
  x4: 0.14004
  x5: 0.14005
}

For hjson I adopted your code to do the testing but then fed the string to str.parse::<f64>().

dtolnay commented 8 years ago

If by "current versions" you mean serde_json 0.7.4, that does not have my fixes. They will be in 0.8.0. On the current master branch I get the correct output.

laktak commented 8 years ago

Sorry, forgot to switch the branch. There are still issues with different numbers though. E.g.

    let input = r#"{
        "x1": 4094.1111111111113,
        "x2": 0.16666666666666666
    }"#;

prints

{
  "x1": 4094.1111111111115,
  "x2": 0.16666666666666667
}
{
  x1: 4094.1111111111113
  x2: 0.16666666666666666
}
dtolnay commented 8 years ago

Neither of those is a parsing issue - the numbers are parsed correctly but printed incorrectly. I can take a look later.

dtolnay commented 8 years ago

I filed https://github.com/miloyip/rapidjson/issues/687 to follow up on the printing issue upstream.

dtolnay commented 7 years ago

I have an arbitrary precision implementation in #252. If you have been waiting for this feature, please take a look and let us know whether it addresses your use case.

The PR adds a cfg:

serde_json = { version = "0.9", features = ["arbitrary_precision"] }

that turns serde_json::Number and serde_json::Value into arbitrary precision types. If you parse some JSON to either of those types, serializing it back to JSON or calling to_string() will preserve exactly the original representation.

The implementation relies on specialization so will only work on nightly for now.

dtolnay commented 7 years ago

The PR is ready to merge but I would like to hear from at least two people that this implementation would be useful to them. I don't want to merge and maintain this if nobody will use it.

alexreg commented 7 years ago

I would be happy even to help you maintain this, if you like. The code looks pretty clear now that I've gone over it.

alexreg commented 7 years ago

Also, I think usage will increase hugely once specialisation gets stabilised. (I'm not sure when that's coming; next release perhaps?)

oli-obk commented 7 years ago

next release perhaps?

by the look from the tracking issue and the 2017 roadmap, I'd say next year at the earliest.

alexreg commented 7 years ago

Ouch, that's slow-moving progress!

Sent from my iPhone

On 13 Feb 2017, at 16:34, Oliver Schneider notifications@github.com wrote:

next release perhaps?

by the look from the tracking issue and the 2017 roadmap, I'd say next year at the earliest.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dtolnay commented 7 years ago

348 has a promising implementation with the same API as #252 but without relying on unstable specialization. It also interacts better with deserializer "adapters" like erased-serde, serde-ignored, and serde-encrypted-value which would not have worked with specialization.

dtolnay commented 6 years ago

The arbitrary_precision feature released in serde_json 1.0.13 adds support for dealing with arbitrary precision numbers through the serde_json::Number type.