Open jekwatt opened 3 years ago
dropna
# drop the rows that contain a missing value
df.dropna(how="any")
df.dropna(how="all")
df.dropna(subset=["col_1", "col_2"], how="any")
df.dropna(subset=["col_1", "col_2"], how="all")
fillna
df["col_1"].fillna(value="MISSING VALUE", inplace=True)
df["col_1"].value_counts(dropna=False)
na_values
# provide list
df = pd.read_csv("file.csv", na_values=["not available", "n.a."])
# provide dictionary
df = pd.read_csv("file.csv", na_values={
"col_1": ["not available", "n.a.", -1],
"col_2": ["not available", "n.a."],
})
Use Flag when creating CSV file:
# not interpret NA strings at load time
# eg "", NA, N/A, NaN
df.to_csv(csv_path, keep_default_na=False)
Answers from Cameron:
pandas developers began searching for a solution ~2 years ago and introduced the pandas.NA value (instead of solely relying on numpy.nan).
The pandas.NA value allows your column to retain other dtypes aside from float and object (which np.nan forces).
A trick you can use is to temporarily mask away the nan's via `df.loc[df['col'].notnull(), 'col']` and
apply an operation to that subset.
https://datatofish.com/check-nan-pandas-dataframe/
4 ways to check for NaN in Pandas DataFrame:
1.
2.
3.
4.