paddymul / buckaroo

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.
https://buckaroo-data.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
174 stars 8 forks source link

integer value 99999999999999999999999 is taken as 10,000,000,000,000,000,000,000 #72

Open nasrin1748 opened 10 months ago

nasrin1748 commented 10 months ago
35
paddymul commented 10 months ago

These are great bug reports! I really appreciate it, I'm digging in.

A couple of requests that will help me fix these more quickly.

  1. Can you put the actual code in text to reproduce?
  2. can you try including the following commands (picture is fine for these)
    bw = BuckarooWidget(offending_df, autoType=False)
    bw
    #close cell

    next cell

    bw.origDf

    and

    bw = BuckarooWidget(offending_df)
    bw
    #close cell

    next cell

    bw.origDf

There are three possible issues with each one of these cases, autoTyping (python widget code), the formatter the widget code hints at in table_hints, and the behavior of the formatter in the frontend.

origDf is the serialized JSON that is sent to the frontend code. I look at that and can tell if the autoTyping is sending the wrong value to the frontend, or if the frontend is formatting it improperly.

paddymul commented 10 months ago

Check out this issue for the end state of how I want to be able to handle this https://github.com/paddymul/buckaroo/issues/74

nasrin1748 commented 10 months ago
2

For this dataframe i created a BuckarooWidget as

bw = BuckarooWidget(offending_df, autoType=False) bw

But i am getting an error KeyError: "['mean'] not in index" for beyond 19digit number.

bw.origDf(autoType=False)

{'schema': {'fields': [{'name': 'index'}, {'name': 'Values'}]}, 'data': [{'index': 0, 'Values': 1}, {'index': 1, 'Values': 2}, {'index': 2, 'Values': 9999999999999999999}], 'table_hints': {'Values': {'is_numeric': True, 'is_integer': True, 'min_digits': 1, 'max_digits': 20, 'histogram': [{'name': 1, 'cat_pop': 33.0}, {'name': 2, 'cat_pop': 33.0}, {'name': 9999999999999999999, 'cat_pop': 33.0}, {'name': 'longtail', 'unique': 100.0}]}}}

bw.origDf

{'schema': {'fields': [{'name': 'index'}, {'name': 'Values'}]}, 'data': [{'index': 0, 'Values': 1}, {'index': 1, 'Values': 2}, {'index': 2, 'Values': 9999999999999999999}], 'table_hints': {'Values': {'is_numeric': True, 'is_integer': True, 'min_digits': 1, 'max_digits': 20, 'histogram': [{'name': 1, 'cat_pop': 33.0}, {'name': 2, 'cat_pop': 33.0}, {'name': -8446744073709551617, 'cat_pop': 33.0}, {'name': 'longtail', 'unique': 100.0}]}}}

name should same for both but it's showing different.

Javascript will round-up to the nearest possible number if the number is too big. so 9999999999999999999999999999 is taken as 1000000000000000000000000000000000000000000. Python will take the exact same but javascript won't.

nasrin1748 commented 10 months ago

Can you put the actual code in text to reproduce?Can you elaborate?

paddymul commented 10 months ago

the core issue is that Javascript treats really large ints as floats, and does rounding. Look into using BigInt https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt

and hinting that BigInt is required from table_hints

nasrin1748 commented 10 months ago

Converting to string is one option: console.log(BigInt("9999999999999999999999999999999999999999").toString())

nasrin1748 commented 10 months ago

not just 9999999999999999999999999 even other number are taken differently. like 33333333333333333333333333333 is taken as 333333000000000000000000000000. {'schema': {'fields': [{'name': 'index'}, {'name': 'inf'}]}, 'data': [{'index': 0, 'inf': 33333333333333333333333333333}], 'table_hints': {'inf': {'is_numeric': False, 'is_integer': False, 'min_digits': None, 'max_digits': None, 'histogram': [{'name': 33333333333333333333333333333, 'cat_pop': 100.0}, {'name': 'longtail', 'unique': 100.0}]}}}

{'schema': {'fields': [{'name': 'index'}, {'name': 'inf'}]}, 'data': [{'index': 0, 'inf': 3.333333333e+28}], 'table_hints': {'inf': {'is_numeric': True, 'is_integer': False, 'min_digits': 29, 'max_digits': 29, 'histogram': [{'name': 3.333333333e+28, 'cat_pop': 100.0}, {'name': 'longtail', 'unique': 100.0}]}}}

For smaller digits the is_numeric is taken as true and for higher digits is_numeric is taken as false. Mostly it might be the cause of the error.