Closed safinaskar closed 11 months ago
In that Rust program I convert numbers from TileRowInfo.cell_storage_buffer
representation directly to my own homegrown fixed-point arithmetic library representation. And then directly convert this number to string. I. e. IEEE floating-point numbers are never involved
You're using decimal128 types I assume? That's what the underlying representation is. I unpack like this: https://github.com/masaccio/numbers-parser/blob/main/src/numbers_parser/cell_storage.py#L345C14-L353 but as you guessed, go through floats after that.
You're using decimal128 types I assume?
Possibly. In my .numbers file these numbers are stored as a * 10**b
I can workaround this now but I am not able to reproduce using this test file.
@safinaskar in the absence of tests, I've published an update. The change will break the current API so I've not made this visible by default. To use it with cat-numbers
, you will have to say cat-numbers --experimental
. In this mode, decimals are decoded using decimal128. I've not benchmarked yet but unless you're doing lots of arithmetic with the results from the API, there should be no difference.
@SheetJSDev - FYI on a future change that will become the default and may change how some of your tests work.
Thanks a lot for your work! I don't have currently time. When I have time, I will test your new version on my data. I hope this will happen in 1-2 months. Please, don't make this experimental behavior to default until I will do my tests
I got time and so tested numbers-parser. I still don't like how it works. Also, I got access to actual Apple Numbers and was able to produce test cases using it.
So, here are steps to reproduce remaining bugs in numbers-parser.
Create file a.csv
with this contents:
0.00016450000000000000
0.0000000000000001
0.0001645
Then open it with Apple Numbers and save as a.numbers
. I used Apple Numbers 13.1 (7037.0.101) running on macOS Ventura. Resulting file is attached. Feel free to store it in your tests. As you can see, Numbers preserves trailing zeros, so 0.00016450000000000000 and 0.0001645 are two distinct numbers from point of view of Numbers. (But I think this is not important for numbers-parser to preserve them.)
Now let's run numbers-parser
4.2.0 on this file with various flags.
<[sid]>root@0ae577710eac:~# ~/.local/bin/cat-numbers --version
4.2.0
<[sid]>root@0ae577710eac:~# ~/.local/bin/cat-numbers /tmp/finalt.numbers
/tmp/finalt.numbers: Sheet 1: finalt: 0.00016450000000000001
/tmp/finalt.numbers: Sheet 1: finalt: 1e-16
/tmp/finalt.numbers: Sheet 1: finalt: 0.0001645
<[sid]>root@0ae577710eac:~# ~/.local/bin/cat-numbers --formatting /tmp/finalt.numbers
/tmp/finalt.numbers: Sheet 1: finalt: 0.00016450000000000001
/tmp/finalt.numbers: Sheet 1: finalt: 0.0000000000000001
/tmp/finalt.numbers: Sheet 1: finalt: 0.0001645
<[sid]>root@0ae577710eac:~# ~/.local/bin/cat-numbers --experimental /tmp/finalt.numbers
/tmp/finalt.numbers: Sheet 1: finalt: 0.000164500000000000
/tmp/finalt.numbers: Sheet 1: finalt: 1E-16
/tmp/finalt.numbers: Sheet 1: finalt: 0.0001645
<[sid]>root@0ae577710eac:~# ~/.local/bin/cat-numbers --experimental --formatting /tmp/finalt.numbers
Traceback (most recent call last):
File "/root/.local/bin/cat-numbers", line 8, in <module>
sys.exit(main())
^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/_cat_numbers.py", line 128, in main
print_table(args, filename)
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/_cat_numbers.py", line 97, in print_table
cells = [cell_as_string(args, cell) for cell in row]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/_cat_numbers.py", line 97, in <listcomp>
cells = [cell_as_string(args, cell) for cell in row]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/_cat_numbers.py", line 80, in cell_as_string
elif args.formatting and cell.formatted_value is not None:
^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/cell.py", line 476, in formatted_value
return self._storage.formatted
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/cell_storage.py", line 284, in formatted
return self.custom_format()
^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/cell_storage.py", line 349, in custom_format
return format_decimal(self.d128, format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/pipx/venvs/numbers-parser/lib/python3.11/site-packages/numbers_parser/cell_storage.py", line 715, in format_decimal
if value.is_integer() and format.decimal_places >= DECIMAL_PLACES_AUTO:
^^^^^^^^^^^^^^^^
AttributeError: 'decimal.Decimal' object has no attribute 'is_integer'
Everything is wrong here. ~/.local/bin/cat-numbers /tmp/finalt.numbers
and ~/.local/bin/cat-numbers --formatting /tmp/finalt.numbers
are wrong, because they print first number as 0.00016450000000000001
. This is wrong answer.
Also ~/.local/bin/cat-numbers /tmp/finalt.numbers
prints 1e-16
. I think cat-numbers
should never print exponential form. ~/.local/bin/cat-numbers --experimental /tmp/finalt.numbers
prints exponential form, too.
And ~/.local/bin/cat-numbers --experimental --formatting /tmp/finalt.numbers
fails with backtrace.
Correct output should either be equal to original csv, i. e.
0.00016450000000000000
0.0000000000000001
0.0001645
either be this:
0.0001645
0.0000000000000001
0.0001645
Thanks for the examples. Very helpful.
Replicating Numbers exactly could be difficult but I will take a look. It uses a non-standard number of bits of precision despite storing these as decimal128 in saved files. A recent example I got shared is how the value 2251799813685247 is rounded 2251799813685250 when you try and enter it.
The original example of 0.00016450000000000001 would be truncated by Numbers as it has too many digits after the first 1 so that may well be how it manages the rounding on imprecise floats.
When I open that original big file in Numbers, I see 0.00016450000000000000 there
When I open that original big file in Numbers, I see 0.00016450000000000000 there
Try adding an extra digit to that decimal; it will be rounded. 0.000164500000000000001 rounds to 0.0001645 as soon as you enter It into a cell.
Numbers can add the zeroes at the end to give a number of decimal places which is what you see, but it cannot store more than 15 significant digits. The solution will be to truncate the number of digits to match the imprecision that Numbers has internally. Using decimal.Decimal
is not the answer after all as that has more precision than Numbers itself.
I have formatting tests I can generate and can apply each of these to a number that fails to round precisely and see how Numbers presents it with its various formatting options. I've already replicated the output for the simple case by using significant figures rounding.
I believe you are looking for the output to match Numbers rather than representing values that Numbers cannot represent. I think matching the behaviour of Numbers is the right approach as well as ensuring that values round trip through load and save.
I believe you are looking for the output to match Numbers
Let me tell what I think numbers-parser should do. Let's assume we somehow got some input .numbers file. Let's open it in Numbers. We see that Numbers rendered some number as A. Then let's process the same .numbers file using cat-numbers. Let's assume cat-numbers printed the same number as B. I want these two properties to hold:
0.1
and 0.10
are equal mathematically. 0.00016450000000000000
and 0.0001645
are equal, too. 0.000164500000000000001
and 0.0001645
are notI don't need any additional properties. I. e. I don't need preserving trailing zeroes.
Also, let me remind that I already solved my original problem, so actually I don't need anything from numbers-parser anymore. I just share my thoughts to help you
The solution will be to truncate the number of digits to match the imprecision that Numbers has internally
I don't understand why you need this. This may be helpful when producing .numbers files from numbers-parser, but I thought we are talking about consuming them
In 4.3.0:
% cat-numbers -b --formatting finalt.numbers
0.00016450000000000000
0.0000000000000001
0.0001645
Regarding your truncation comment: Numbers does not store 0.0001645
as a string but instead converts its internal value to a decimal128 which results in imprecise values during floating point conversion. This is what you were seeing previously.
@masaccio, thanks! But this output still looks wrong to me:
# ~/.local/bin/cat-numbers -b /tmp/a.numbers
0.00016450000000000001
1e-16
0.0001645
, because first numbers is wrong and second number is exponential. But if you don't think this is bug, I don't, too
Thanks again. The first number is a float conversion artefact (fixed). The CSV output has always been default python output so I will leave as-is since it's still correct in a machine-readable context. --formatting
is expected to replicate the formatting of numbers including stuff like zero/space padding.
I have one .numbers file. Its
TileRowInfo.cell_storage_buffer = 6
( https://github.com/psobot/keynote-parser/blob/7114e3b6594a68d6c6885f469c7b4b3bdc27eb86/protos/TSTArchives.proto#L128 ) for one particular cell contains this data:I reverse-engineered this format and wrote Rust program for conversion from .numbers format. In my Rust program I decode this number as
0.0001645
. I believe this is correct rendering of this number.But the following two commands:
both produce number 0.00016450000000000001. I believe this is incorrect answer