mwouts / itables

Pandas DataFrames as Interactive DataTables
https://mwouts.github.io/itables/
MIT License
723 stars 54 forks source link

Not displaying large numbers (>16 digits) correctly #152

Closed titus-ong closed 1 year ago

titus-ong commented 1 year ago

The issue: Large numbers only display first 16 digits, 17th digit is rounded (sometimes incorrectly) and the rest are 0s.

How to replicate:

from itables import show
import pandas as pd

a = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [1234567890123456789, 2345678901234567890, 3456789012345678901],
        "Sex": ["male", "male", "female"],
    }
)

show(a)

Result:

Name Age Sex
Braund, Mr. Owen Harris 1234567890123456800 male
Allen, Mr. William Henry 2345678901234567700 male
Bonnell, Miss. Elizabeth 3456789012345679000 female

Note that Allen's 17th digit (the second 7 in 77) should be a 9.

Version: 1.4.4

Notebook: Jupyter Notebook

mwouts commented 1 year ago

Oh interesting! Thanks for taking the time to report this. I will have a look and see how to fix this soon.

titus-ong commented 1 year ago

Cool, the issue of the 0's always appears for large numbers but I've not yet looked much into why the rounding sometimes doesn't work properly. For those facing this issue, you could either convert the column into string or have an additional column displaying the last few digits (e.g. using modulus).

EDIT: I had a brief look at the code and saw that the rendering involves JS, so it could possible to do with the way JS stores large numbers? See https://www.irt.org/script/1031.htm. That would explain why the rounding is incorrect sometimes.

mwouts commented 1 year ago

This is what I found:

I think we have two options: either convert to strings the numbers that are larger than Number.MAX_SAFE_INTEGER, or try using the BigInt javascript type. I will give a try to the second option following the comments here.

mwouts commented 1 year ago

I've done some research/experiments on BigInt, and unfortunately I have not been able to modify our custom JSON encoder to automatically map the large Python integers to Javascript BigInt. In principle, we just need to add a n at the end of these large numbers, e.g. [1234567890123456789n, 2345678901234567890n, 3456789012345678901n] for the array described above. However in practice we cannot override the encoding of the known types, cf. this SO question.

I have not tested how well the BigInt work with datatables, so I am not sure whether this will work when we pass the step of the JSON encoding. What makes me doubt about this are the last comments about sorting big numbers. Maybe @AllanJard I can ask you whether you expect BigInt to work in datatables? If that helps I can ensure that these column only contains BigInt (no other integers types, but maybe a few strings like "NA", "Infinity").

AllanJard commented 1 year ago

The answer is no, the current release doesn't support BigInt. It throws an error in the number handling code. However, I've just committed a couple of changes which addresses that issue, and it will now correctly render with a BigInt column.

AllanJard commented 1 year ago

Just to say - the other option is to make your long numbers strings in the JSON feed.

mwouts commented 1 year ago

The answer is no, the current release doesn't support BigInt. It throws an error in the number handling code. However, I've just committed a couple of changes which addresses that issue, and it will now correctly render with a BigInt column.

Oh that's interesting! In the long term I think this is the way to go

Just to say - the other option is to make your long numbers strings in the JSON feed.

Yes I think that is the best short term approach. On the Python side I will convert the numbers that are larger than MAX_SAFE_INTEGER to strings. The only drawback I see is that sorting might not work well, but that I might display a (discardable) warning to make sure the user is aware of that.

mwouts commented 1 year ago

I see that datatables==1.13.2 added support for big ints (thank you @AllanJard !).

I still have some work to do on the Python side as well but I will give it a try.

mwouts commented 1 year ago

Well for the moment it seems more reasonable to convert the columns that contains big integers to string.

@titus-ong would you like to give a try to the proposed fix? You can install it with

pip install git+https://github.com/mwouts/itables.git@bigint_to_str
titus-ong commented 1 year ago

image Thanks for the fix, it works! Suppressing the warnings also work.

mwouts commented 1 year ago

Awesome! Thanks for confirming. By the way, I am wondering whether the default value for warn_on_int_to_str_conversion should be True or False. What do you think? As we don't lose much by converting to string, maybe it is not necessary to issue a warning for this? (only the sorting will not work properly after the conversion)

titus-ong commented 1 year ago

Ah I didn't think about sorting, but since it affects sorting I think the warning could be True by default. I don't require sorting for my use case but I can imagine that's one of your package's features that others may use a fair amount, so it might be better to warn pre-emptively. There probably wouldn't be a lot of users who deal with such large numbers so I don't think the warning is excessive either.

mwouts commented 1 year ago

Thanks for your feedback! I will let the warning by default then. This workaround will be available in itables==1.5.2. I have also created a follow-up issue to provide a proper fix for this, but I must say that it is a bit out of reach for now: #172. Thanks!