pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.57k stars 17.9k forks source link

reindex_like(s, method='ffill') is different than reindex_like(s).fillna(method='ffill') #34547

Open actual-panda opened 4 years ago

actual-panda commented 4 years ago

I don't know whether this a bug or a feature, but the behavior is not clear to me after reading the reindex_like docs.

There is a note "Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index." - but what it actually does is to fill "as if" the index after reindexing was sorted.

Sample code

>>> import pandas as pd
>>> s1 = pd.Series(['[0, 1)', '[1, 3)', '[3, 4)', '[4, 6)', '[6, inf)'], index=[0, 1, 3, 4, 6], dtype='string')
>>> s2 = pd.Series(['']*8, index=[6, 2, 5, 0, 4, 7, 1, 3], dtype='string')
>>>
>>> s1
0      [0, 1)
1      [1, 3)
3      [3, 4)
4      [4, 6)
6    [6, inf)
dtype: string
>>> s2
6    
2    
5    
0    
4    
7    
1    
3    
dtype: string
>>> s1.reindex_like(s2).fillna(method='ffill')
6    [6, inf)
2    [6, inf)
5    [6, inf)
0      [0, 1)
4      [4, 6)
7      [4, 6)
1      [1, 3)
3      [3, 4)
dtype: string
>>> s1.reindex_like(s2, method='ffill')
6    [6, inf)
2      [1, 3)
5      [4, 6)
0      [0, 1)
4      [4, 6)
7    [6, inf)
1      [1, 3)
3      [3, 4)
dtype: string

I expected the same result with both methods. Should they behave differently?

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252 pandas : 1.0.3 numpy : 1.18.4 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : 5.4.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
tomytp commented 5 months ago

I believe this can be closed following the merge of pull request: https://github.com/pandas-dev/pandas/pull/58724