Closed andyr0id closed 2 months ago
I think isnumeric will need to return true for pinttypes. Had a similar issue when plotting and pinttypes were filtered out by df.select_dytpes(is_numeric). https://github.com/pandas-dev/pandas/issues/35340
You should open an issue in pandas for this.
pandas.core.ops.computation.isnumeric
is essentially based on numpy :
def isnumeric(dtype) -> bool:
return issubclass(np.dtype(dtype).type, np.number)
At this stage dtype
variable type is <class 'pint_pandas.pint_array.PintType'>
. Adding a dtype
property to this class/variable with an actual numeric dtype allows to pass the isnumeric
test. Final result however is somehow disappointing as types are lost in the process:
c:\Users\xxxxxxx\AppData\Local\miniconda3\Lib\site-packages\pandas\core\arrays\numpy_.py:127: UnitStrippedWarning: The unit of the quantity is stripped when to ndarray.
downcasting to ndarray.
result = np.asarray(scalars, dtype=dtype) # type: ignore[arg-type]
0 0.25
1 0.40
2 0.50
dtype: float64
I faced a similar issue when getting plotting to work. A similar isnumeric command was returning false. isnumeric
should also be checking the _is_numeric
attribute of the dtype if the dtype is an extensiondtype.
The np.array...
would also need changing so it can return an extensionarray.
These are things that would need changing in pandas. you'll need to open an issue and pr there.
Made a test to confirm that the issue is definitly on pandas side. I used IntegerArrays which are an other type of ExtensionArray and evals also fails at the isnumeric
stage.
df = pd.DataFrame({'a': pd.array([1, 2, 3]), 'b': pd.array([4, 5, 6])})
df.eval('a / b')
leads to
TypeError: Cannot interpret 'Int64Dtype()' as a data type
I'll open an issue there.
isnumeric
part of the issue is being solved.pandas.core.computation.ops.Div
class will _cast_inplace
input arrays to floats loosing in the process the ExtensionArray specifics. (this induce by the way an issue when performing eval on complex values). This casting is in my opinion not needed anymore (was introduced 6 years ago) and we shall ask and convince pandas teams to suppress this casting.pandas.core.computation.ops.Op.has_invalid_return_type
is called from pandas.core.computation.expr.BaseExprVisitor._maybe_evaluate_binop
inducing a TypeError
. Bypassing this test allows to get the proper result. This is not an issue when working with pandas build in extension arrays so there might be something specific to do in pint-pandas. pandas.core.common.result_type_many
is also involved in the process and be incriminated.This is more or less a note to myself but fell free to drop a comment if any of the above rings a bell.
For the third item we can resolve the issue overloading the _get_common_dtype
function of ExtensionDtype
in PintType
. We can find examples of such method in pandas.core.dtypes.dtypes.BaseMaskedDtype._get_common_dtype
or pandas.core.dtypes.base.ExtensionDtype._get_common_dtype
. The second one is pretty trivial but does not fit our needs.
I think we can go as simple as the following example:
class PintType(ExtensionDtype):
(...)
def _get_common_dtype(self, dtypes):
return self
- ~this
isnumeric
part of the issue is being solved.~- ~There is second issue afterwards as
pandas.core.computation.ops.Div
class will_cast_inplace
input arrays to floats loosing in the process the ExtensionArray specifics. (this induce by the way an issue when performing eval on complex values). This casting is in my opinion not needed anymore (was introduced 6 years ago) and we shall ask and convince pandas teams to suppress this casting.~- There is a third issue then:
pandas.core.computation.ops.Op.has_invalid_return_type
is called frompandas.core.computation.expr.BaseExprVisitor._maybe_evaluate_binop
inducing aTypeError
. Bypassing this test allows to get the proper result. This is not an issue when working with pandas build in extension arrays so there might be something specific to do in pint-pandas.pandas.core.common.result_type_many
is also involved in the process and be incriminated.
Two first point have been managed in pandas core. We can now concentrate on the third item.
Hello, I'm running into an issue with using
pint_pandas
with pandas'eval
function:The problem arises when pandas tries to check if the columns are numeric by calling this function. This is in turn passed to
np.dtype(dtype)
, which results in the below type error.It's important for my application to use pandas eval, and the above is just a toy example.
Full stack trace: