Open arw2019 opened 3 years ago
Note that this is not directly related to to_numeric
, as it is the pd.array()
construction that fails:
In [11]: pd.array([pd.NA, pd.NA], dtype="float")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-73361050ec83> in <module>
----> 1 pd.array([pd.NA, pd.NA], dtype="float")
~/scipy/pandas/pandas/core/construction.py in array(data, dtype, copy)
344 return TimedeltaArray._from_sequence(data, dtype=dtype, copy=copy)
345
--> 346 result = PandasArray._from_sequence(data, dtype=dtype, copy=copy)
347 return result
348
~/scipy/pandas/pandas/core/arrays/numpy_.py in _from_sequence(cls, scalars, dtype, copy)
178 dtype = dtype._dtype
179
--> 180 result = np.asarray(scalars, dtype=dtype)
181 if copy and result is scalars:
182 result = result.copy()
~/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
TypeError: float() argument must be a string or a number, not 'NAType'
And this is because with dtype="float"
it actually tries to make a numpy-based float array, not a nullable pandas FloatingArray. And therefore it tries to convert pd.NA to a float, under the hood the error comes from:
In [12]: float(pd.NA)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-7541b87f4222> in <module>
----> 1 float(pd.NA)
TypeError: float() argument must be a string or a number, not 'NAType'
Updated the title to reflect the actual issue.
Are we ok with this behavior or is it something we want to "fix"?
I don't think it's something we plan to fix on the short term. At some point in the future, we might want that those lower-case names like "float" will mean the nullable dtypes instead of the plain numpy ones within a pandas context. But then this issue will be resolved automatically (since converting all NA list to nullable float is already working).
Not sure this is really an issue but maybe(or not?) a slight inconsistency.
The following throws:
but with
np.nan
it runs fine: