Open sanderland opened 4 years ago
Tracked it down to _reindex_axes
in frame.py
: the same method
is used for reindexing index and columns, which means that some kind of string ordering means 'a' ffills into 'b' (reversing the column names in the example makes bfill fail and ffill succeed). Either way, the nearest
method is very unhappy about all of this.
def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy):
frame = self
columns = axes["columns"]
if columns is not None:
frame = frame._reindex_columns(
columns, method, copy, level, fill_value, limit, tolerance
)
index = axes["index"]
if index is not None:
frame = frame._reindex_index(
index, method, copy, level, fill_value, limit, tolerance
)
return frame
PR is welcome. Issue here seems to be twofold:
method
argument really only makes sense when applied to the index, not the columns, and will only be used on columns where the names match.bfill
and nearest
when the column names don't match, because we are returning a result with backfilled values (or raising in the case of nearest
), but we really don't have data to do the backfilling with, so we should just return series with NaN
Hi @Dr-Irv ,
I am new to open source and would love to work on this issue.
take
Hi @Dr-Irv ,
I am new to open source and would love to work on this issue.
It's now all yours!
Code Sample, a copy-pastable example if possible
Problem description
It appears different methods of filling treat non matching columns differently. I came across this when trying to reindex two dataframes with one column each, whose names didn't match (as they were essentially irrelevant).
Workaround
before reindex do
df_2.columns = df_1.columns
Expected Output
The documentation says:
Exactly whether this means column name or position is not super clear, but either way the current behaviour is inconsistent. I would prefer if all of them worked like bfill.
Output of
pd.show_versions()