pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.71k stars 17.93k forks source link

column-wise fillna with Series/dict NotImplemented #4514

Open hayd opened 11 years ago

hayd commented 11 years ago

As per discussion on this SO question is NotImplementedError.

Solution/workaround is to transpose do transpose? This is used elsewhere in DataFrame.fillna method. just raise if inplace?

cc @cpcloud

In [9]: df = pd.DataFrame([[np.nan, np.nan], [np.nan, 4], [5, 6]], columns=list('AB'))

In [10]: df
Out[10]:
    A   B
0 NaN NaN
1 NaN   4
2   5   6

In [11]: df.mean(0)
Out[11]:
A    5
B    5
dtype: float64

In [12]: df.fillna(df.mean())
Out[12]:
   A  B
0  5  5
1  5  4
2  5  6

In [13]: df.mean(1)
Out[13]:
0    NaN
1    4.0
2    5.5
dtype: float64

In [14]: df.fillna(df.mean(1), axis=1)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-14-aecc493431e2> in <module>()
----> 1 df.fillna(df.mean(1), axis=1)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in fillna(self, value, method, axis, inplace, limit, downcast)
   3452             if isinstance(value, (dict, Series)):
   3453                 if axis == 1:
-> 3454                     raise NotImplementedError('Currently only can fill '
   3455                                               'with dict/Series column '
   3456                                               'by column')

NotImplementedError: Currently only can fill with dict/Series column by column
jreback commented 11 years ago

this is pretty straightforward after merging in series subclass NDFrame. This has to deal with possible dtype changes when doing column wise (which I think its not-implemented)

jorisvandenbossche commented 9 years ago

Came up again at SO

aileronajay commented 7 years ago

@jreback @jorisvandenbossche is this implemented yet? I was trying to do this result.fillna(result.mean(axis=1), axis =1 ) and got the same exception

jreback commented 7 years ago

issue is open

aspiringguru commented 6 years ago

my simple/crude workaround below. curious why this issue is still open in 2018.

colnames = list(df)

for colname in colnames: df[colname].fillna(method='bfill', inplace=True)

jorisvandenbossche commented 6 years ago

curious why this issue is still open in 2018.

Because nobody made the effort to implement it. But you are welcome to do so.

malbahrani commented 6 years ago

Faced the same issue. It is worth mentioning here, @hayd solution posted in StackOverflow

Thanks for the workaround Andy!

aadarshsingh191198 commented 1 year ago

This isn't implemented even in the latest version of pandas (v2.0.1). Can we reopen this issue, please?

aftersought commented 1 year ago

To confirm, in my usage ffill(axis=1, inplace=True) works as long as all dtypes of the columns are the same; I get NotImplemented error if they have mixed dtypes.

ShivnarenSrinivasan commented 9 months ago

I see this is still an open issue Not sure if I will be able to implement this, but assigning to myself to build on top of previous PR attempts.

take

kdebrab commented 2 months ago

Solution/workaround is to do transpose?

We actually used transpose (i.e., df.T.fillna(df.mean(1)).T), but it proved to be really horrendous with regard to performance. This performance issue happens especially when the index of df is (much) longer than the number of columns. We solved it by using apply instead:

df_mean = df.mean(1)
df.apply(lambda col: col.fillna(df_mean))