pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.57k stars 17.9k forks source link

ENH: add errors='coerce' to DataFrame.astype #48781

Open joooeey opened 2 years ago

joooeey commented 2 years ago

Feature Type

Problem Description

I wish I could quickly convert a DataFrame with some invalid data to numeric type and coerce. I thought pd.DataFrame.astype could do that but it doesn't have the option to coerce invalid data to NaNs (or NaTs).

In my particular case I have a DataFrame of sensor readings with mostly NaNs (indicating no value received), many integers (those I care about), and some strings (indicating specific errors). I quickly tried to get a histogram to get an overview of that data but the pd.DataFrame.hist requires numeric data which is a few lines of code to get. This is exploratory code I write in my console, so it would be sweet if this could be done with a single method.

Toy Example

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [np.NaN, 0.1, 1.1, 1.6],
    ["error", 0.2, 1.2, 1.7],
    [0.3, "", 1.3, 1.8],
    [0.4, 1.4, "code255", 1.9],
])

df.astype(float, errors="coerce")
# ValueError: Expected value of kwarg 'errors' to be one of ['raise', 'ignore'].
# Supplied value is 'coerce'

import matplotlib.pyplot as plt
plt.hist(df.values.flatten(), bins=[0, 1, 2])

Expected result:

In [30]: df
Out[30]: 
     0    1    2    3
0  NaN  0.1  1.1  1.6
1  NaN  0.2  1.2  1.7
2  0.3  NaN  1.3  1.8
3  0.4  1.4  NaN  1.9

image

Feature Description

Two options:

OR/AND

To me it looks like the potential for confusing the user is a lot lower with the second option because it has fewer edge cases.

Alternative Solutions

for col in df.cols:
    df[col] = pd.to_numeric(df[col], errors="coerce")

Additional Context

No response

subbusainath commented 2 years ago

take

MarcoGorelli commented 2 years ago

Hi @joooeey

To expedite resolution, could you please include a reproducible example?

Like

joooeey commented 2 years ago

@MarcoGorelli I added a toy example to the description.

jbrockmendel commented 2 years ago

Cc @jorisvandenbossche i think you were looking into design decisions related to this