pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

pandas/tests/io/json/test_pandas.py::TestPandasContainer::test_read_json_large_numbers failing for 32-bit system #35279

Open TomAugspurger opened 4 years ago

TomAugspurger commented 4 years ago
    @pytest.mark.parametrize("bigNum", [sys.maxsize + 1, -(sys.maxsize + 2)])
    # @pytest.mark.xfail(sys.maxsize == 2**32, reason="")
    def test_read_json_large_numbers(self, bigNum):
        # GH20599

        series = Series(bigNum, dtype=object, index=["articleId"])
        json = '{"articleId":' + str(bigNum) + "}"
        with pytest.raises(ValueError):
            json = StringIO(json)
            result = read_json(json)
            tm.assert_series_equal(series, result)

        df = DataFrame(bigNum, dtype=object, index=["articleId"], columns=[0])
        json = '{"0":{"articleId":' + str(bigNum) + "}}"
        with pytest.raises(ValueError):
            json = StringIO(json)
            result = read_json(json)
>           tm.assert_frame_equal(df, result)
E           AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="0") are different
E
E           Attribute "dtype" are different
E           [left]:  object
E           [right]: int64

We have

-> tm.assert_frame_equal(df, result)
(Pdb) result
           0
articleId  1
(Pdb) df
                              0
articleId  18446744073709551617
TomAugspurger commented 4 years ago

I think there are two issues:

  1. For numbers between 32-bit and 64-bit max, the result is correct but the test is wrong. On 32-bit systems the result is int64 dtype.
  2. For numbers larger than 64-bit max, the result wraps around.