Open joooeey opened 2 years ago
take
Hi @joooeey
To expedite resolution, could you please include a reproducible example?
Like
df = pd.DataFrame(...)
...
...
@MarcoGorelli I added a toy example to the description.
Cc @jorisvandenbossche i think you were looking into design decisions related to this
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I wish I could quickly convert a DataFrame with some invalid data to numeric type and coerce. I thought
pd.DataFrame.astype
could do that but it doesn't have the option to coerce invalid data to NaNs (or NaTs).In my particular case I have a DataFrame of sensor readings with mostly NaNs (indicating no value received), many integers (those I care about), and some strings (indicating specific errors). I quickly tried to get a histogram to get an overview of that data but the
pd.DataFrame.hist
requires numeric data which is a few lines of code to get. This is exploratory code I write in my console, so it would be sweet if this could be done with a single method.Toy Example
Expected result:
Feature Description
Two options:
arg
ofpd.to_numeric
. In case of mixed type columns (e.g. integers and floats), we'd have to decide and document if that would operate by column or cast the whole data structure to one dtype. I'd expect by column for DataFrames. Another issue that comes up is how to deal with multidimensional lists and tuples.OR/AND
"coerce"
to theerrors
kwarg inpd.DataFrame.astype
(the current options are"raise"
and"ignore"
. We'd have to decide how to deal with incompatibilities betweenerrors="coerce"
anddtype
. E.g. what to do if someone tries to coerce to string. I would expect an error.To me it looks like the potential for confusing the user is a lot lower with the second option because it has fewer edge cases.
Alternative Solutions
Additional Context
No response