konsoletyper / teavm

Compiles Java bytecode to JavaScript, WebAssembly and C
https://teavm.org
Apache License 2.0
2.55k stars 261 forks source link

Initial re-implementation for Double.parseDouble/Float.parseFloat #829

Closed Ihromant closed 8 months ago

Ihromant commented 8 months ago

Hi @konsoletyper . This is initial (and therefore unfinished) implementation for Double.parseDouble improvement. Code was borrowed from https://android.googlesource.com/platform/libcore/+/d5e2817/luni/src/main/java/java/lang/Double.java Android source repository (Apache 2.0). Test was borrowed from there as well. So, test shows that we have issues in Double.toHexString (suppose will be fixed separately) and with parsing. I reused implementation and replaced native implementation to the most naїve approach instead of real implementation (should be transformed from C++ https://cs.android.com/android/platform/superproject/+/master:libcore/luni/src/main/native/java_lang_StringToReal.cpp;l=92;drc=master ). Is it worth continuing working on this PR or stop here? It appears that even naїve implementation covers most of the tests that currently present.

konsoletyper commented 8 months ago
  1. There's already existing implementation that works quite well except for few corner cases. Isn't it better to fix it rather than to replace everything by another implementation?
  2. Current implementation of Double.parseDouble uses common engine to turn decimal exponent + decimal mantissa into IEEE 754 binary exponent + binary mantissa (see DoubleSynthesizer). The same engine used by TDecimalFormat, so whenever you change something, you should also take care of TDecimalFormat
  3. This algorithm uses naive approach that was implemented in TeaVM quite some time ago. Looks like acting in terms of IEEE 754 doubles does not provide enough accuracy, so I replaced everything with integer math. Using long to store mantissa gives 64 bits instead of 52 bits.
  4. This is not only inaccurate, but also slow approach. Computing 10^x every time may be slow compared to using tables.
  5. Recently, I improved my implementation to use compile-time-calculated 100%-precise tables to convert between decimal and binary mantissa (I used BigInteger, then cut first 64 bits). This gave much far better result, not 100% accurate though.
  6. You should test any changes there WRT consistency between parseDouble/Double.toString (i.e. feed result of Double.toString to parseDouble for large amount of random numbers). After my recent changes there's approx. 0.2% of numbers that give only 1 ULP inaccuracy (i.e. binary mantissa differs only by 1). Did you test your changes this way?
Ihromant commented 8 months ago
  1. Here is your comment https://github.com/konsoletyper/teavm/issues/735#issuecomment-1718063424 . According to it you managed to produce 99,5% and 96,5% accurate parse for doubles/floats respectively. This just means that it's (I'm not saying impossible) complicated and unneeded to spend efforts to improve custom self-written algorithm. It's better to reuse common knowledge shared under Open-Source-friendly license. Together with the code it comes with applicable tests. I understand that self-made is pretty challenging, but it's not worth your/my time.
  2. Thanks for the info, I didn't consider it in beginning. Still, formatting is about toString, not parsing. Then I suppose that changes for toString also should be included to be consistent. In Android test suite only 4 tests failed in JDK, so I'm assuming that this implementation is pretty similar and precise.
  3. This naїve approach was used just to make code working. Check TODO link. There is C++ code that uses same tables approach under the hood. It just needs to be rewritten in Java.
  4. See 3.
  5. See 3.
  6. Sure, it should be tested well, basic question is unanswered: whether continue working on this PR or just drop it? I didn't test my changes WRT consistency because I don't want to prepare full PR (this is not 1-day work obviously) which later will be refused.
konsoletyper commented 8 months ago

This just means that it's (I'm not saying impossible) complicated and unneeded to spend efforts to improve custom self-written algorithm.

No, I can't agree. At least, after all my efforts I can't let you throw away my almost perfectly working code.

It's better to reuse common knowledge shared under Open-Source-friendly license.

Yes, and get the code that neither you, nor I understand. And moreover, you need to somehow port it from C++.

Sure, it should be tested well, basic question is unanswered: whether continue working on this PR or just drop it

In this form drop it. You can take some tests from Harmony/Android implementation to find issues in current implementation, as well as fix these errors in current code. You can read some papers about decimal-to-binary conversion to improve accuracy and fix current algorithm. But in this current it's almost impossible to review.