serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.9k stars 558 forks source link

Perfect accuracy float parsing #536

Closed dtolnay closed 4 years ago

dtolnay commented 5 years ago

Float parsing is currently implemented by calculating the significand as u64, casting to f64, and then multiplying or dividing by a nonnegative power of 10. For example the input 10.9876543210987654 would be parsed into the value 109876543210987654_u64 as f64 / 10e15.

This algorithm is sometimes correct, or else usually close to correct in practical usage. It matches how JSON parsers are implemented in other languages.

However, it can happen that the result from this algorithm is not the mathematically nearest 64-bit floating point value to the exact value of the input. A "correct" algorithm would always produce the mathematically nearest answer. This requires high precision big-integer arithmetic in the general case so there would be a large performance cost; if implemented, we would likely want this behind a cfg that is off by default, with the current approximate behavior as default. This way programs can opt in to the more expensive algorithm as required.

fn main() {
    let input = "10.9876543210987654";
    let n: f64 = serde_json::from_str(input).unwrap();

    // produces 10.9876543210987644982878919108770787715911865234375
    // which is low by 9.017e-16
    let current_algorithm = 109876543210987654_u64 as f64 / 10e15;
    println!("{}", precise::to_string(current_algorithm));
    assert_eq!(n, current_algorithm);

    // produces 10.98765432109876627464473131112754344940185546875
    // which is high by 8.746e-16 (closer)
    let correct_answer = 10.9876543210987654_f64;
    println!("{}", precise::to_string(correct_answer));
    assert_ne!(n, correct_answer);
}
Diggsey commented 5 years ago

The python json library does get this right (or at least gets it more right that serde_json currently does) as I recently discovered when trying to replace a python service with one written in rust 😢

sjackman commented 4 years ago

@dtolnay Any update on this PR? I just ran into this issue today.

bluenote10 commented 4 years ago

A stupid question: Why not use the standard library parse? It seems to have an accurate implementation.

if implemented, we would likely want this behind a cfg that is off by default, with the current approximate behavior as default.

Is this what is already available under the arbitrary_precision feature flag, or is it referring to something else?

Just in case it is open for debate, I would vote for making the exact version the default and making the approximate/optimized logic opt-in. Defaulting to lossy serialization/deserialization round-trips is a source of real headaches.

As an example: rust-geo-booleanop is modeled after an algorithm written in JS, and currently we are trying to fix some issues in both implementations. Test cases are specified as JSON. In theory the implementations should do exactly the same, but I was debugging weird differences for days. Considering Rust striving for correctness in general, I suspected the differences were caused on JS side. I really didn't expect that the test case input data gets modified during parsing on Rust side.

Diggsey commented 4 years ago

making the approximate/optimized logic opt-in.

If the mechanism for changing this option is feature-flags, then it makes more sense to opt-in to the higher precision option, even though you might think that should be the default. Otherwise, any dependency could trigger lossy precision and unintentionally break your application.

rw commented 4 years ago

Would it be possible to have the best of both worlds, by falling back to 100% correct parsing when an inaccuracy is detected? Perhaps by reserializing with ryu? This feature is important to my use case.

rw commented 4 years ago

Adding to @bluenote10's comment. It seems that the standard library's parse function is straightforward and probably good for this use case.

In fact, @dtolnay has committed to the stdlib file containing parse_decimal recently :-)

Edit: And the JSON parser in Rust's libserialize uses parse_decimal.

rw commented 4 years ago

Adding some more detail about how important this is for me: what I'm working on will be the source of truth for our users' data. As a result, we need to know that the parsing is lossless. I'm happy to help out with a PR if there's consensus on how to approach this.

sjackman commented 4 years ago

Did you see this WIP PR https://github.com/serde-rs/json/pull/541 ?

rw commented 4 years ago

@sjackman IMHO, because that PR takes the feature flag approach, I find it problematic. I think correct parsing should be the default behavior.

sjackman commented 4 years ago

I think correct parsing should be the default behavior.

I agree.

Alexhuszagh commented 4 years ago

As the author of rust-lexical, I would highly recommend not implementing your own correct float parser without exhaustive testing, since it's very easy to get it wrong.

I see one of 3 general approaches: 1). Pre-process the input to allow it to use Rust's dec2flt (str.parse::<F: Float>()). 2). Use a dedicated float parser that supports JSON floats (rust-lexical does this as of v0.6). 3). Copy a subset of Rust lexical's code (I would be willing to do this) to enable a slightly lighter solution, while ensuring you build on prior experience.

Any correct float parser requires: 1). Arbitrary-precision arithmetic (big-ints). 2). Fairly efficient slow-path routines. 3). Very fast routines in fast cases. 4). Complex error estimation for other situations.

It's easy to get this wrong, so I will gladly help others do this, if I can be of service.

To use the lexical approach, you would use the following code:

let format = lexical_core::NumberFormat::JSON;
let result = lexical_core::parse_format::<f64>("3.0e7", format);
sjackman commented 4 years ago

@Alexhuszagh Would you consider submitting a PR to use rust-lexical to parse floats? It could both help start a conversation, and give a SHA-1 for people to use that have an immediate need for correct parsing of floats in the interim. (I'm not a maintainer of serde-rs/json.)

Alexhuszagh commented 4 years ago

@ijl has already put a fair amount of work in both helping me implement JSON support and also submitted a PR for serde-json to use lexical in #541. I can check, but I believe this PR should suffice.

rw commented 4 years ago

@Alexhuszagh Are there differences between the byte-at-a-time approach and what #541 is doing? IIUC, using lexical as-is means keeping intermediate buffers around.

Alexhuszagh commented 4 years ago

@rw I haven't checked exactly, but the byte-at-a-time approach would still require an intermediate buffer, it just wouldn't be visible and would allow us to implement it in terms of io::Read. That may not have been your assumption originally, so let me explain. Also, there is no good way to get around this.

Iterator Types

Using C++-like nomenclature, I'm going to briefly introduce iterator types:

Examples of input iterators in Rust include io::Bytes, while example of forward iterators include linked-lists, hashmaps, deques, vectors, etc (most of those support much more than just forward-iteration, but they support at least forward iteration).

Algorithm Approach

All Algorithms 1). Read as many significant digits into a 64-bit integer before overflowing. 2). Read the exponent, if present, into a 16-bit (or larger) integer, returning max or min on overflow. 3). Store the index where the integer and fraction started in the string, in case we need to use a slow algorithm.

Fast Algorithm 1). Check if the significant digits use 53-bits or less (for a double-precision float). 2). Check if the exponent can be represented exactly as a double-precision float [1e-22, 1e22]. 3). If both conditions are satisfied, convert the significant digits to a float and the exponent value to a float, and return the product (IEEE754 floats guarantee the use of guard digits, so we won't lose any precision from multiplying 2 exact-values a single time).

Moderate Algorithm If the fast algorithm cannot be used, fallback to the moderate algorithm. 1). Represent the float as an extended-precision float (80-bits, 64-bits for the significant digits, 16-bits for the exponent). 2). Normalize the significant digits (make sure the significant digits have a 1 in the most-significant bit). 3). Get the first 64 significant digits and the binary exponent from the actual decimal exponent using pre-calculated tables. 4). Multiply the significant digits, calculating both the significant digits and the binary exponent. 5). Calculate the accuracy of our estimate, by estimating the error that could occur due to this approximation. 6). If the error would not affect the significant digits, return the value as a float.

Slow Algorithms If the moderate algorithm isn't sufficient, we must use a slow, correct algorithm. This is the step that requires an intermediate buffer, since we must have at least 1 more pass over the input digits. We have the value correct to within 1 ULP (unit of lead-precision). 1). Using our extended-precision float, calculate b, which is always less than or equal to the actual value. 2). Calculate the value of b+h, which is exactly halfway between b and b+1. 3). Calculate the exact value of the actual digits using a big-integer representation. 4). The steps differ here, due to various optimizations, but effectively, compare the actual digits present to the theoretical digits of b+h. 5). If the present digits are above b+h, return b+1, Otherwise, return b.

Summary

As you can see, there is no way to remove iterating over the input twice in the worst-case, so we have to have an intermediate buffer at some level. This also coincides with how readers typically work: they almost always use buffered-reads, just the buffer isn't user-visible (imagine making syscalls for every byte read, or a network request per-byte).

That means at some level, a buffer needs to be present to implement a correct and efficient parser. In order to avoid iterating twice over the input we would need to: 1). Only the slow algorithm, however, it would be 10-100x slower than the moderate algorithm, and slower still than the fast algorithm. 2). Have an incorrect parser, which only uses the fast or moderate algorithms, and is accurate to within 1 ULP.

We do, however, have 1 advantage: we know that only 767 decimal digits can contribute to a binary number with 53-bits of significant digits. For the theory behind this claim, please see Section 2.7 on radix conversions. This means we have an upper limit on the number of bytes required for both the input buffer, and the big-integer, allowing us to stack-allocate both.

Alexhuszagh commented 4 years ago

In short, any approach accepting an input iterator or similar object like io::Read would require a temporary buffer behind the scenes, just with a lot of constraints on that (which makes our life a lot easier).

Diggsey commented 4 years ago

@Alexhuszagh thanks for the link and the great explanation! However, I think your statement about 767 decimal digits and buffer sizes is misleading. The handbook says that 767 digits may be required to perfectly represent a 64-bit float in base 10, but parsing algorithms are doing the opposite conversion, and way fewer digits than that are required to unambiguously identify a 64-bit float.

A rough estimate (53 * log(2)) shows that you shouldn't require much more than about 16 (decimal) significant figures to fully determine the floating-point mantissa, and you could just ignore any further digits in the input value. That's small enough that buffering it in memory shouldn't really be a concern (if you are smart about what you keep in the buffer - eg. ignore leading zeros).

Alexhuszagh commented 4 years ago

@Diggsey You're doing it the wrong way, although you're getting fairly close (you have a very good intuition for this). Float to string only requires 17 digits to uniquely identify a float, but going string to float requires up to 767 digits because you can get close-to-halfway cases. However, this allows us to use a stack-allocated buffer regardless, since there is no dynamic memory allocation.

See some examples in here for examples of it (see C126, C127, C128 for this): https://github.com/Alexhuszagh/rust-lexical/blob/master/data/test-parse-unittests/rust_parse_tests.toml

Alexhuszagh commented 4 years ago

Here are practical Python examples for those test cases, @Diggsey. As you can see, we need a max of 767 significant digits to parse a float, only 17 to write a float.

>>> x = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984374999e-324"
>>> float(x)
5e-324

>>> y = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984375e-324"
>>> float(y)
1e-323

>>> z = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984375001e-324"
>>> float(z)
1e-323
Diggsey commented 4 years ago

Ah I was forgetting about such edge cases - I would be glad if serde_json would just losslessly round-trip all 64-bit floats, which seems like a much easier problem.

dtolnay commented 4 years ago

I agree with @Diggsey, I am more interested in round tripping correctly by default and a lot less interested in perfect parsing of arbitrarily long decimal representations by default, as I don't see that as a common practical requirement that would be worth paying for in compile time by default. We can have an option but it would be off by default if it doesn't come for free from perfect round tripping.

In particular I don't think I am on the same page as @Alexhuszagh right now, somewhere along the work in the perfect_float PR it appears to have gotten further away from correctly round tripping things (https://github.com/serde-rs/json/pull/541#discussion_r376170435).

Alexhuszagh commented 4 years ago

@dtolnay See the comments addressed to that, I'm not entirely sure what went on in that example, but it doesn't reproduce for me anymore.

As for round-tripping 17-digit floats, it's definitely possible using only the fast and moderate paths described above in most cases (there are some edge-cases with exponents that will cause this to fail, actually). Unfortunately, arbitrary-precision arithmetic is pretty much required to fully round-trip every value, which also means it supports arbitrarily-long inputs.

Alexhuszagh commented 4 years ago

For example, the following 3 tests are good test-cases on why you need support for arbitrarily-long inputs necessarily:

>>> float("2.4703282292062326e-324")
0.0
>>> float("2.4703282292062327e-324")
0.0
>>> float("2.4703282292062328e-324")
5e-324

This is because although only 17 digits are present, which is enough to uniquely identify a float, the exponent being base-10 means we get approximation errors for close-to-halfway values. A side-effect is we need internally to use arbitrary-precision arithmetic, which also allows us to support arbitrarily long inputs. The two, sadly, are linked.

rw commented 4 years ago

@dtolnay said

...I don't see that as a common practical requirement that would be worth paying for in compile time.

Personally, I'd be happy to have a longer compile time if this was available. I guess it would be like a restricted version of the arbitrary-precision arithmetic feature?

Alexhuszagh commented 4 years ago

I do have code for 128-bit mantissa using the moderate path that should be able to handle round-trip conversions, however, it provided a fairly meagre performance benefit when used, and was a lot slower when the slow path was required. However, it should handle any case of 17-19 (likely, anything 30 or less) digits to fully round-trip.

dtolnay commented 4 years ago

I think we are still not on the same page. :( I don't see how the examples in https://github.com/serde-rs/json/issues/536#issuecomment-583198124 are relevant to round tripping f64 -> json -> f64. 2.4703282292062328e-324 is not a json number that you would get by serializing an f64; the nearest f64 serializes to 5e-324 which is far easier to parse. Parsing arbitrary 17-19 digit values is not a requirement right now, I would recommend focusing on f64 -> json -> f64 round trip and getting the best compile time performance there.

Alexhuszagh commented 4 years ago

If that's your goal, I can gift you code that will do this. The moderate path with be more than sufficient for this. I won't implement it personally, however.

Alexhuszagh commented 4 years ago

I can submit a PR that would do this. It will take me some time, but it's stable code, and it will round-trip clear representations of values. I don't personally like the approach (it will get certain inputs wrong, which may be present from decimals or other representations) so I'm not willing to modify lexical to do this.

Let me know if that works?

dtolnay commented 4 years ago

Thanks! That would be great. The things I would like to know to move this issue forward, from you or someone else, are:

Alexhuszagh commented 4 years ago

@dtolnay Compile-time performance should be fairly minimal. Run-time performance should be fairly minimal as well. Almost all of the overhead comes from the arbitrary-precision work. Assuming you have the code already to parse the significant digits and the exponent in a float (which I'm assuming you do), then the remaining requirements are basically this:

1). You need an extended-precision float class. 2). You need an algorithm to round-nearest, tie-even (like 20 lines of simple code). 3). You need an algorithm to normalize the extended-precision float, and then multiple normalized extended-precision floats. 4). You need a fast-path algorithm (~20 lines of simple code). 5). You need some helpers and trait methods.

For the other requirement, a lot, lot more. Especially for parsing complex inputs. Lexical is not optimized for compile-times, and for good reason.

Let me get you a simple proof of concept, and I can show you performance is good in pretty much all cases. I can give you numbers right now, but it would likely be better-done in-context.

dtolnay commented 4 years ago

For the other requirement, a lot, lot more. Especially for parsing complex inputs. Lexical is not optimized for compile-times, and for good reason.

Thanks, this was my hunch, and why I insisted on distinguishing between the requirements even if lexical isn't interested in supporting the simpler one. I know that people who care about accuracy at all costs are overrepresented among those who choose to follow this thread, but in the broader serde_json user base compile time matters a whole lot.

Alexhuszagh commented 4 years ago

@dtolnay Fair. I'll have a working example shortly, it's not optimized for speed yet (I've stripped out every single inline, and just have stripped out almost every trait and assumption), which will be published here. I'll license this code however you wish, as long as you give me appropriate credit (just a blurb in the source file).

dtolnay commented 4 years ago

Sounds good, I'd be happy to give credit in the source file. Thanks for your help!

Alexhuszagh commented 4 years ago

@dtolnay I've pushed the initial commit. Everything is done through the parse_float method, but it should give you a pretty good idea of how minimal the requirements are: https://github.com/Alexhuszagh/minimal-lexical

It's < 800 lines of code, no macros (except for trait impl). Let me add documentation so it makes some sense, and then a README, and a few working examples and I'll submit a PR to integrate it in.

Alexhuszagh commented 4 years ago

Ok, it's live. Let me know if everything looks good, and if so, I'll submit a PR with this as the default float parser. I'll add more comprehensive tests in the actual composite, but every piece is working identically to its behavior in rust-lexical, which makes it unlikely any errors are present.

Please note this doesn't actual handle parsing the significant digits from value, or the exponent, but those require fairly trivial string-to-integer conversions (using checked_add and checked_mul to exit early on overflow while preserving the value), so it makes minimal assumptions (IE, it's very easy to integrate into a parser requiring io::Read). This just ensures an accurate float is generated from that value (given the conditions mentioned above), with fairly minimal overhead.

dtolnay commented 4 years ago

@Alexhuszagh Nice!

I ran the following loop as a smoke test. Is it fair to expect that this loop should run forever? I believe termination would indicate a bug in either ryu or minimal-lexical.

let mut rng = XorShiftRng::from_seed([0; 16]);
let mut buffer = ryu::Buffer::new();
loop {
    let input = f64::from_bits(rng.next_u64());
    if input.is_finite() {
        let printed = buffer.format_finite(input);
        let (output, rest) = parse_float::<f64>(printed.as_bytes());
        assert_eq!(output, input);
        assert_eq!(rest, b"");
    }
}

However it fails on the following value, and the failure reproduces using std::fmt instead of ryu, which suggests this is more likely a bug in minimal-lexical. Any idea what could be going on?

let expected = -0.04628372940652459_f64;
println!("expected: {}", expected);
let mantissa = 4628372940652459_u64;
let exponent = -17_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);
println!("actual:   {}", actual);
assert_eq!(actual, expected);
Alexhuszagh commented 4 years ago

Ok, this is patched, thank you. I forgot (hard-earned lessons) that powi is not stable for floats, even small powers that can be exactly represented. Rust-lexical uses hard-coded constants for these small powers (10^0 to 1^10 for f32, and 10^0 to 10^22 for f64). This has since been fixed. I'll run the smoke-screen, and get back to you once everything passes.

Alexhuszagh commented 4 years ago

@dtolnay I am finding one real-world example where this round-trip fails, and this won't be solvable without the slow-path algorithm.

The failing test case is:

let expected = 2.638344616030823e-256_f64;
let mantissa = 2638344616030823_u64;
let exponent = -271_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);
assert_eq!(actual, expected);

This value is actually a halfway point, which is why we can't differentiate it. Here are the examples printed with numerous extra digits, just to show the problem:

b  = 2.638344616030822852118090731545e-256
b1 = 2.638344616030823147881137273239e-256
# Estimated, since this cannot be a real float
bh = 2.638344616030822999999614002392e-256

A quick check shows the moderate path in rust-lexical fails here, because it's very close to a halfway value. To fix this, we would need a slow-path algorithm. Ultimately, this is your call. It works with the longer formatting printed out to 17-19 digits:

let mantissa = 2638344616030823147_u64;
let exponent = -274_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);

However, any modern float formatting algorithm (like Ryu, Grisu3, etc.) will likely format it almost exactly on the halfway point, which means this is a real problem in practice.

dtolnay commented 4 years ago

What is the minimum compile time required to handle cases like 2.638344616030823e-256? Is it effectively equivalent to compiling lexical-core with "correct" feature enabled? On my machine that corresponds to a 45% increase in CPU-seconds to compile serde_json which is not going to be something that we would do by default.

Alexhuszagh commented 4 years ago

I can provide a much smaller subset, so no, it won't be that large. A majority of the compile-time bloat from lexical is the presence of features to parse non-decimal strings, non-float-parsing conversions, and also support for arbitrary formats including decimal separators, etc. I can add in the slow-path algorithms and the compile-time overhead will be much lower than lexical itself, however, the actual extraction of the significant float digits and validation of the float format will have to be done by the client.

Also, I don't have to use all my slow path algorithms either, which will cut down compile-time even further. I currently have 3 slow-path algorithms, 2 of which are used for inputs <= 1000 characters, and one which is used for >= 1000 characters (non-decimal floats only). Since we only need some of these, it cuts back even further.

A sample signature would be:

fn parse_float<'a, F, Iter1, Iter2>(integer: Iter1, fraction: Iter2, exponent: i32) -> F
    where F: Float,
          Iter1: Iterator<Item=&'a u8>,
          Iter2: Iterator<Item=&'a u8>;

This way, all the parsing logic and validation of the float format can be handled with minimal overhead by the library using the minimal version.

If this is the case, I would likely spin out minimal-lexical into a published crate, rather than a stub of one, because the amount of code will be a fair bit larger, and I would want to ensure any regressions get fixed immediately.

Alexhuszagh commented 4 years ago

Ok, @dtolnay, if you want to test minimal-lexical and the parse times, the updated version is now up. https://github.com/Alexhuszagh/minimal-lexical

The API has somewhat changed, but see the examples provided for how to use it. I've also integrated your smoke test in, which seems to be passing now. It's <3,000 lines of code (the full version of lexical is ~40,000, because supporting different float formats is a complete and utter nightmare, so compile-times are slow, even if the binary is fairly small and fast), compiles quickly, and all my simple stress tests seem to pass.

I'm going to add in Rust's tests for comprehensive float parsing and well-known edge-cases in float parsers: https://github.com/Alexhuszagh/rust-lexical/tree/master/data

If all that passes, and the smoke test runs for more than an hour, I'll submit a PR to integrate this in.

rw commented 4 years ago

@Alexhuszagh Would it be useful to exhaustively test all f32 values?

Alexhuszagh commented 4 years ago

@rw If you want, sure. The unittests are quite comprehensive, and I'm mostly looking for regressions introduced in extracting the code from the greater lexical project, but more coverage is always good.

Alexhuszagh commented 4 years ago

Ok, everything works: 1). The smoke test has run for > 30 minutes, without any failures. 2). We've exhaustively tested conversion to/from f32 using ryu and minimal-lexical. 3). The comprehensive-float-test from Rust's libcore works. 4). We've tested the code works on numerous different architectures and OSes.

I'll wait for @dtolnay to play with it a bit, see if he can break it, see if the compile times are fast enough, and then I'll submit a PR.

Finally, the build times are quite nice now: build_time

Unit Total Codegen Features
1. minimal-lexical v0.1.0 1.1s 0.2s (20%) default, std
2. arrayvec v0.4.12 1.0s 0.0s (2%) array-sizes-33-128, default, std
3. arrayvec v0.4.12 custom-build 0.7s 0.0s (0%) array-sizes-33-128, default, std
4. minimal-lexical v0.1.0 custom-build 0.4s 0.0s (0%) default, std
5. nodrop v0.1.14 0.2s 0.0s (15%)
6. cfg-if v0.1.10 0.1s 0.0s (24%)
7. arrayvec v0.4.12 custom-build (run) 0.1s 0.0s (0%) array-sizes-33-128, default, std
8. minimal-lexical v0.1.0 custom-build (run) 0.0s 0.0s (1%) default, std
dtolnay commented 4 years ago

Could you see if it's possible to strip this down further to bring down the compile time? I know you said the crate this is mostly coming from was never optimized for compile time, but now would be the time to do that. In particular I am not excited about bringing in new transitive dependencies on nodrop, arrayvec, and cfg-if.

Alexhuszagh commented 4 years ago

Would you be fine with dynamic allocation for the big-integer backing store? Otherwise, arrayvec will be a requirement (vec or arrayvecis a necessity). I did have a previous crate, which was essentially a drop-in replacement but I removed it because arrayvec was better maintained.

cfg_if can be trivially replaced, but it's lightweight, so I doubt that will significantly change compile times. However, from a dependency management standpoint, I get it, and can remove it. As for nodrop, it's a requirement for arrayvec until v0.5, which unfortunately only supports Rustc 1.37+.

Alexhuszagh commented 4 years ago

Ok, I've removed both dependencies by default. Arrayvec can be re-enabled by the feature no_alloc, which then means the entire crate will not allocate. As a result, it uses Vec as a backing store by default now.

rw commented 4 years ago

@Alexhuszagh @dtolnay Is there anything I can do to help?

japaric commented 4 years ago

I have opened a new PR (#671) that uses the latest version of minimal-lexical to fix this issue. The PR description has quite a few measurements but the summary is that I'm (only) seeing an increase in compile times of ~2-3% (# instructions as reported by perf) or ~0.3s in wall time (out of an initial ~15s) for builds from scratch of the serde-json crate.

dtolnay commented 4 years ago

The new float parser landed behind a feature in 1.0.54 -- https://github.com/serde-rs/json/blob/v1.0.54/Cargo.toml#L58-L65. It's possible it will be default in a future version.

serde_json = { version = "1.0.54", features = ["float_roundtrip"] }