hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

Add support for UFloat in PintArray (#139) #140

Open MichaelTiemannOSC opened 2 years ago

MichaelTiemannOSC commented 2 years ago

Signed-off-by: MichaelTiemann 72577720+MichaelTiemannOSC@users.noreply.github.com

MichaelTiemannOSC commented 2 years ago

As noted, this change request may precipitate changes needed in how Pint deals with uncertainties (whose representation is not supported by vanilla Python parsers).

andrewgsavage commented 2 years ago

This lets you store ufloat objects, but I'm not sure how well pandas will for operations like reshaping, indexing etc. I suspect some will work but some won't.

I think you can add ufloats to fixtures in test_pandas_extensiontests and it'll run with standard and ufloat quantities. This should show what works and doesn't.

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

MichaelTiemannOSC commented 2 years ago

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

@pytest.fixture
def data():
    return PintArray.from_1darray_quantity(np.arange(start=1.0, stop=101.0) * ureg.nm)

@pytest.fixture
def data_missing():
    return PintArray.from_1darray_quantity([np.nan, 1] * ureg.meter)

@pytest.fixture
def data_for_twos():
    x = [
        2.0,
    ] * 100
    return PintArray.from_1darray_quantity(x * ureg.meter)

to ufloat-based values? I'm going to give that a try, but if there's a better way to do it (or a you meant something different) please guide me. Thanks!

MichaelTiemannOSC commented 2 years ago

Progress...I've now got 203 failures, the vast majority of which are due to the fact that ufloats that are computed to be the same value, but not the same variable, do not compare as equal, so all the assert_series_equal, assert_extension_array_equal, assert_frame_equal fail, through no fault of their own. It will be quite some work to plug in an appropriate "assert_*_nominally_equal` everywhere.

I do not know how to make progress on this:

__________________________________________________________________________________________________________________ ERROR at setup of TestMethods.test_insert_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py, line 558
      def test_insert_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, tmp_path, tmp_p\
ath_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py:558
_________________________________________________________________________________________________________________ ERROR at setup of TestSetitem.test_setitem_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py, line 437
      def test_setitem_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, full_indexer, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, t\
mp_path, tmp_path_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py:437

Also: 191 passed, 1 skipped, 69 xfailed, 5 xpassed

andrewgsavage commented 2 years ago

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

yes like that, there are some fixtures that return a value from a list, like all_arithmetic_operators. I'd try changing these so they test both ufloat and float.

andrewgsavage commented 2 years ago

Use this fork of pandas for the assert...equal tets to work https://github.com/hgrecco/pint-pandas/blob/236cf0f856333704f035297df039dfb5302cbc8a/.github/workflows/ci.yml#L11

edit: I'm not 100% sure this will work, it makes the tests work for quantity but you might have a different issue there

andrewgsavage commented 2 years ago

fixture 'invalid_scalar' not found

that's coming from a newer pandas with extra tests compared pandas 1.4x, you can rebase from my https://github.com/hgrecco/pint-pandas/pull/133 if you're using a higher version... but for now I wouldn't worry about those errors

MichaelTiemannOSC commented 2 years ago

More progress...down to essentially one kind of failure--the un-hashabiliity of UFloat. I don't have the Python sophistication to know what the next step would be, but I'll push the changes I have thus far.

=== short test summary info ================================================================================
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_agg_extension - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="B") are different
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_no_sort - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_transform - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[scalar] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[list] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[object] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_in_numeric_groupby - AssertionError: Did not see expected warning of class 'FutureWarning'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestInterface::test_contains - AssertionError
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts_with_normalize - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod_series_array - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array_duplicates - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_scalar_key_sequence_raise - Failed: DID NOT RAISE <class 'ValueError'>
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_frame_2d_values - AssertionError: Caused unexpected warning(s): [('UnitStrippedWarning', UnitStrippedWarning('The unit of the quantity is stripped when downcasting to ndarray.'), ...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_delitem_series - TypeError: unhashable type: 'AffineScalarFunc'
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_insert_invalid
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_invalid
==== 35 failed, 331 passed, 48 xfailed, 26 xpassed, 268 warnings, 2 errors in 15.68s =======================
MichaelTiemannOSC commented 2 years ago

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

MichaelTiemannOSC commented 2 years ago

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I'll give that a try. While I'm not sure it's toally necessary, I think it's a good exercise.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

It will certainly keep the discussion with Pandas on point to present uniform ExtensionTypes and to keep both Pint and Pandas on the straight-and-narrow there.

andrewgsavage commented 2 years ago

Does it make sense to rebase PintArrays on uarrays

you can try, it's easy to check. I don't know what effects that'd have. Numpy is used accross pandas so would still be used.

MichaelTiemannOSC commented 1 year ago

I thought it would be helpful to me to merge in changes that have been merged upstream, not realizing how that would affect the size of the PR. If I should revert to the previous version so that the PR has only my changes, I'll do that. I'm still learning my way around git, and everything new I do is truly a "learning experience".

MichaelTiemannOSC commented 1 year ago

I'm running into problems with the newly expanded test suite. The major problem is that UFloat resolves to a type that is not hashable. I've been able to work through that in a few cases (converting dozens of failures into successes), but I'm stumped as to how to deal with groupby. I've chased the problem through the factorize maze, but just when I thought I had it figured out, the get_iterator method of BaseGrouper (pandas/core/groupby/ops.py) expects what will be a name to be hashable.

@andrewgsavage any words of advice?

MichaelTiemannOSC commented 1 year ago

Because these changes also require changes to pint, pandas, and uncertainties, some coordination will be needed before the CI/CD system works for these changes. Please let me know if I can do anything to help. It would be great to land these changes in time for Pandas anticipated 2.1 release (Aug 03 last time I checked).

MichaelTiemannOSC commented 1 year ago

Converting to Draft status as this now depends on Pandas 2.1 functionality (and on proper DataFrame reduction support as mentioned in #174). When Pandas 2.1 lands, I'll re-test and re-open when ready.

MichaelTiemannOSC commented 1 year ago

Good news--the uncertainties parser has been added to Pint. If that goes out with a 0.23 release, we'll have a firm target against which we can update the test matrix to use that. I would then probably also make the change to import pint.HAS_UNCERTAINTIES rather than re-deriving Pint-Pandas' own version of that variable. Any other changes you'd really like to see before merging this?

MichaelTiemannOSC commented 1 year ago

Thanks @hgrecco for creating Pint-0.23rc0. I've attempted to write some rules to optionally test uncertainties, but I don't know the right way to leave the normal Pint-Pandas matrix to do its thing, and then create an uncertainties test that uses Pandas-2.1.0 and Pint-0.23rc0. The approach I took simply overrode Pandas and Pint every time. Maybe I need to add a null string "" and then test for the null string? My yaml/shell scripting is fairly primitive.

In any case, if I can get the CI/CD working as it should, then these changes should be ready to go for a pint_pandas-0.6rc0.

MichaelTiemannOSC commented 1 year ago

We now have CI/CD testing without uncertainties across a matrix, and additionally testing uncertainties in one specific matrix case (Python 3.9, Pandas 2.1.0, Pint 0.23rc0). That should make us good to go for an 0.6rc0, no?

MichaelTiemannOSC commented 1 year ago

Nudge @andrewgsavage

andrewgsavage commented 11 months ago

Sorry whenever I look at this I see that uses a numpy object type array, which I didn't expect to work... and get hung up thinking about well it works.

There aren't any examples using an array of objects, so although it works, it's not really supported. Merging this implies support, which I'm not really sure is a good idea. The code wasnt written with that in mind, and other than your issue test there's no testing for this a mix of different objects in numpy array.

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

reading this through again, that's kind of what I meant by making a seperate ExtenisonArray for uncertainties, say UncertaintyArray. The PintArray can then use the UncertaintyArray for the data, as it does with other pandas EAs. The extra logic needed to handle uncertainties you've written in this PR can be moved to the UA and the PintArray can deal with the units and delegate the rest to the UA magnitude (like how pint's Quantity does so)

I'm not sure how well a UA would work with pint-pandas (it may still need some code to handle uncertainties) but would like to see how well that works before deciding on this.

MichaelTiemannOSC commented 11 months ago

Sorry whenever I look at this I see that uses a numpy object type array, which I didn't expect to work... and get hung up thinking about well it works.

Me too 😆

There aren't any examples using an array of objects, so although it works, it's not really supported. Merging this implies support, which I'm not really sure is a good idea. The code wasnt written with that in mind, and other than your issue test there's no testing for this a mix of different objects in numpy array.

Having spent a fair bit of time inside the Pandas code, especially transitioning from 1.5.x to 2.x, they really have nicely generalized support for object types in NumPy arrays. I started off needing a few changes to Pandas to support this implementation initially, but over time every single change I originally needed was rendered unnecessary. I did manage to get some changes merged upstream (and have another one in the works, see https://github.com/pandas-dev/pandas/pull/55825). That latter one is not because numpy doesn't tolerate object types, but because Pint Quantities have their own opinions about being put into arrays.

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

reading this through again, that's kind of what I meant by making a seperate ExtenisonArray for uncertainties, say UncertaintyArray. The PintArray can then use the UncertaintyArray for the data, as it does with other pandas EAs. The extra logic needed to handle uncertainties you've written in this PR can be moved to the UA and the PintArray can deal with the units and delegate the rest to the UA magnitude (like how pint's Quantity does so)

My initial thoughts are informed by the uncertainties implementation (which of course does use uarrays). Because of the way uncertainties wrap and unwrap everything, it's super-inconvenient to treat them as a first-class type that can live in an ExtensionArray. Maybe I didn't approach that correctly the first time, but I did try it and I couldn't make it work. Which is why I went with the flow and decided to let it do its magic on the magnitudes.

One of the very last things I needed to do was to ensure that the block managers would not mess up allocation of arrays that initially contained only floats, but which would later need to contain an uncertainty. I realized this was the same problem if one started with an array of float64 and then something later changed one entry to a complex128. I went big and added comprehensive complex testing to Pandas (https://github.com/pandas-dev/pandas/pull/54761/) which did indeed expose the same problem. So I proposed using the uncertainties changes to solve the complex128 problems. The Pandas folks fixed that by always allocating a fresh array if something didn't fit, and those changes were no longer neccessary.

I'm not sure how well a UA would work with pint-pandas (it may still need some code to handle uncertainties) but would like to see how well that works before deciding on this.

Happy to continue the discussion, and happy to prototype. Just wanted to share my experiences so far.

andrewgsavage commented 11 months ago

there is this https://github.com/varchasgopalaswamy/AutoUncertainties/blob/main/auto_uncertainties/pandas_compat.py

does it do what you need on the uncertanty progopation side?

MichaelTiemannOSC commented 11 months ago

I reviewed that approach in this comment: https://github.com/hgrecco/pint/pull/1797#issuecomment-1703912878

Which is why I re-upped my suggestion to merge the changes I developed instead. If I missed something, I would be happy to look again.

andrewgsavage commented 11 months ago

I made a start at this, enough to show what I'm thinking with code:

arr = np.array([ufloat(1, 0.01), ufloat(2, 0.1)])
arr
array([1.0+/-0.01, 2.0+/-0.1], dtype=object)

ua = UncertaintyArray(arr,np.array([False,False]))
ua
<UncertaintyArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: <class 'pint_pandas.uncertainty_array.UncertaintyType'>

pa = PintArray(ua,dtype="m")
pa
<PintArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: pint[meter]

pa.data
<UncertaintyArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: <class 'pint_pandas.uncertainty_array.UncertaintyType'>

Main benefits to this approach are:

My initial thoughts are informed by the uncertainties implementation (which of course does use uarrays). Because of the way uncertainties wrap and unwrap everything, it's super-inconvenient to treat them as a first-class type that can live in an ExtensionArray. Maybe I didn't approach that correctly the first time, but I did try it and I couldn't make it work. Which is why I went with the flow and decided to let it do its magic on the magnitudes.

Do you have an example of this? I'm not familiar with uncertainties.


I started off by inherting from BaseMaskedArray, which uses a mask to know where the nans are. This may help with the nans issue, or it may not be needed. It's also not public... If avoiding the nans issue isn't needed it may be better to reserrect https://github.com/pandas-dev/pandas/pull/45544

MichaelTiemannOSC commented 11 months ago

Who will be maintaining UncertaintyArray? Is this something we need to get the Pandas folks to adopt? Numpy? Pint depends on the uncertainties package (for Measurement). Getting Pandas to support uncertainties as a dtype would be quite a lift (see https://github.com/pandas-dev/pandas/pull/49181#pullrequestreview-1147447045). It could be that some day in the future, UncertaintyArray could be merged from Pint/Pint-Pandas into Pandas, but I think we have to assume that the work needs to start within our sphere, not theirs.

As for the wrapping/unwrapping, a better way to say this is that uncertainties does is magic not by how it specializes arrays but by wrapping all the NumPy ufuncs that operate on uncertainty values (though as NumPy and uncertainties have diverged there are unintentional gaps (see https://github.com/lebigot/uncertainties/issues/163#issuecomment-1284106046).

unumpy arrays are not derived from numpy arrays, they wrap/reimplement everything. Which is why we have the ugliness of unp.isna(uarray) vs. pd.isna(PandasArray). It seems like UncertaintyArray could implement the necessities of uarrays while providing a pandas-friendly interface so we can use isna naturally. Which would be great.

Let me know if you'd like to share a branch and collaborate. My code certainly exercises uncertainties, Pint, and Pint-Pandas!

andrewgsavage commented 11 months ago

Who will be maintaining UncertaintyArray?

It should either be in uncertainties or in another module, uncertainties-pandas.

Is this something we need to get the Pandas folks to adopt? Numpy?

Shouldn't, but may need changes like you found for pandas.

Pint depends on the uncertainties package (for Measurement).

Can't remember exactly what the plans were, but I remember reading something, possibly to move that to another module... but noones got around to it in forever.

Getting Pandas to support uncertainties as a dtype would be quite a lift (see pandas-dev/pandas#49181 (review)). It could be that some day in the future, UncertaintyArray could be merged from Pint/Pint-Pandas into Pandas, but I think we have to assume that the work needs to start within our sphere, not theirs.

Yea I don't see that happening in a long time either. There's a few pandas related issues under uncertainties so I think that's where to make a PR. It may be better in another package to prevent a dependency on pandas.

As for the wrapping/unwrapping, a better way to say this is that uncertainties does is magic not by how it specializes arrays but by wrapping all the NumPy ufuncs that operate on uncertainty values (though as NumPy and uncertainties have diverged there are unintentional gaps (see lebigot/uncertainties#163 (comment)).

unumpy arrays are not derived from numpy arrays, they wrap/reimplement everything. Which is why we have the ugliness of unp.isna(uarray) vs. pd.isna(PandasArray). It seems like UncertaintyArray could implement the necessities of uarrays while providing a pandas-friendly interface so we can use isna naturally. Which would be great.

Let me know if you'd like to share a branch and collaborate. My code certainly exercises uncertainties, Pint, and Pint-Pandas!

Numpy functions are a long way off! but I suspect you can do similar to how pint does with __array_function__ to get desired results

MichaelTiemannOSC commented 11 months ago

xref: https://github.com/pandas-dev/pandas/issues/14162, https://github.com/pandas-dev/pandas/issues/20993, and https://github.com/pandas-dev/pandas/issues/22290. The first two especially give strong encouragement to bring uncertainties into Pandas. I think that many of the 2.x changes make this much easier, though no clue yet on how to deal with the uncertainties package in CI/CD. I think a strong proof of concept in Pint-Pandas could break through into Pandas (which would be a win for all).

andrewgsavage commented 11 months ago

None of those issues imply the UncertaintyArray should live in pandas! Just that one should be made. Indeed in the first linked issue

better yet would be to write and ExtensionArray to have first class support in pandas.

I'd be very suprised if they'd merge it into pandas, hence my uncertainties or uncertainties-pandas suggestion.