Add support for UFloat in PintArray (#139)

MichaelTiemannOSC commented 2 years ago

Signed-off-by: MichaelTiemann 72577720+MichaelTiemannOSC@users.noreply.github.com

[x] Closes #139
[x] Executed pre-commit run --all-files with no errors
[x] The change is fully covered by automated unit tests
[ ] Documented in docs/ as appropriate
[x] Added an entry to the CHANGES file

MichaelTiemannOSC commented 2 years ago

As noted, this change request may precipitate changes needed in how Pint deals with uncertainties (whose representation is not supported by vanilla Python parsers).

andrewgsavage commented 2 years ago

This lets you store ufloat objects, but I'm not sure how well pandas will for operations like reshaping, indexing etc. I suspect some will work but some won't.

I think you can add ufloats to fixtures in test_pandas_extensiontests and it'll run with standard and ufloat quantities. This should show what works and doesn't.

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

MichaelTiemannOSC commented 2 years ago

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

@pytest.fixture
def data():
    return PintArray.from_1darray_quantity(np.arange(start=1.0, stop=101.0) * ureg.nm)

@pytest.fixture
def data_missing():
    return PintArray.from_1darray_quantity([np.nan, 1] * ureg.meter)

@pytest.fixture
def data_for_twos():
    x = [
        2.0,
    ] * 100
    return PintArray.from_1darray_quantity(x * ureg.meter)

to ufloat-based values? I'm going to give that a try, but if there's a better way to do it (or a you meant something different) please guide me. Thanks!

MichaelTiemannOSC commented 2 years ago

Progress...I've now got 203 failures, the vast majority of which are due to the fact that ufloats that are computed to be the same value, but not the same variable, do not compare as equal, so all the assert_series_equal, assert_extension_array_equal, assert_frame_equal fail, through no fault of their own. It will be quite some work to plug in an appropriate "assert_*_nominally_equal` everywhere.

I do not know how to make progress on this:

__________________________________________________________________________________________________________________ ERROR at setup of TestMethods.test_insert_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py, line 558
      def test_insert_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, tmp_path, tmp_p\
ath_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py:558
_________________________________________________________________________________________________________________ ERROR at setup of TestSetitem.test_setitem_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py, line 437
      def test_setitem_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, full_indexer, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, t\
mp_path, tmp_path_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py:437

Also: 191 passed, 1 skipped, 69 xfailed, 5 xpassed

andrewgsavage commented 2 years ago

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

yes like that, there are some fixtures that return a value from a list, like all_arithmetic_operators. I'd try changing these so they test both ufloat and float.

andrewgsavage commented 2 years ago

Use this fork of pandas for the assert...equal tets to work https://github.com/hgrecco/pint-pandas/blob/236cf0f856333704f035297df039dfb5302cbc8a/.github/workflows/ci.yml#L11

edit: I'm not 100% sure this will work, it makes the tests work for quantity but you might have a different issue there

andrewgsavage commented 2 years ago

fixture 'invalid_scalar' not found

that's coming from a newer pandas with extra tests compared pandas 1.4x, you can rebase from my https://github.com/hgrecco/pint-pandas/pull/133 if you're using a higher version... but for now I wouldn't worry about those errors

MichaelTiemannOSC commented 2 years ago

More progress...down to essentially one kind of failure--the un-hashabiliity of UFloat. I don't have the Python sophistication to know what the next step would be, but I'll push the changes I have thus far.

=== short test summary info ================================================================================
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_agg_extension - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="B") are different
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_no_sort - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_transform - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[scalar] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[list] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[object] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_in_numeric_groupby - AssertionError: Did not see expected warning of class 'FutureWarning'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestInterface::test_contains - AssertionError
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts_with_normalize - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod_series_array - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array_duplicates - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_scalar_key_sequence_raise - Failed: DID NOT RAISE <class 'ValueError'>
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_frame_2d_values - AssertionError: Caused unexpected warning(s): [('UnitStrippedWarning', UnitStrippedWarning('The unit of the quantity is stripped when downcasting to ndarray.'), ...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_delitem_series - TypeError: unhashable type: 'AffineScalarFunc'
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_insert_invalid
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_invalid
==== 35 failed, 331 passed, 48 xfailed, 26 xpassed, 268 warnings, 2 errors in 15.68s =======================

MichaelTiemannOSC commented 2 years ago

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

MichaelTiemannOSC commented 2 years ago

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I'll give that a try. While I'm not sure it's toally necessary, I think it's a good exercise.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

It will certainly keep the discussion with Pandas on point to present uniform ExtensionTypes and to keep both Pint and Pandas on the straight-and-narrow there.

andrewgsavage commented 2 years ago

Does it make sense to rebase PintArrays on uarrays

you can try, it's easy to check. I don't know what effects that'd have. Numpy is used accross pandas so would still be used.