Open noahblakesmith opened 3 weeks ago
Thanks for the report, could you please update the title to have a description?
That said, based on this comment - https://github.com/pandas-dev/pandas/issues/51074#issuecomment-1409344688 this is expected behavior
This behavior appears inconsistent with that of other data types, such as
int
.
Can you give an example that demonstrates the inconsistency?
Sure thing @rhshadrach. Here is an example using int
, which throws an error. I also tested float
, "Int64"
, and "int64[pyarrow]"
, which produced similar errors.
import pandas as pd
col = pd.Series(["a", "b", "c"])
col = col.astype(dtype=int, errors="raise")
Traceback (most recent call last):
File "./test.py", line 4, in <module>
col = col.astype(dtype=int, errors="raise")
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/generic.py", line 6643, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 430, in astype
return self.apply(
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 363, in apply
applied = getattr(b, f)(**kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 758, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 237, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 182, in astype_array
values = _astype_nansafe(values, dtype, copy=copy)
File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 133, in _astype_nansafe
return arr.astype(dtype, copy=True)
ValueError: invalid literal for int() with base 10: 'a'
Thanks @noahblakesmith. I would not call this inconsistent since categorical dtype has it's own specialized semantics as @asishm mentioned. This is well-established and purposeful behavior, so it is also not a bug.
That said, there is agreement this is undesired behavior. This is very closely related, and may even be fixed by, #40996.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
No error is raised when recasting as a
category
, despite the presence of an undefined value,c
. Rather,c
is coerced toNaN
.This behavior appears inconsistent with that of other data types, such as
int
.Expected Behavior
I believe an error should be raised.
Installed Versions