pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.68k stars 17.92k forks source link

pd.tseries.frequencies.is_subperiod(x, x) is inconsistent #18553

Open dtourillon opened 6 years ago

dtourillon commented 6 years ago

Code Sample, a copy-pastable example if possible

import pandas as pd

In [6]: pd._libs.tslibs.frequencies.is_subperiod('D', 'D')
Out[6]: True

In [7]: pd._libs.tslibs.frequencies.is_subperiod('M', 'M')
Out[7]: False

Problem description

When source == target, shouldn't pd._libs.tslibs.frequencies.is_subperiod(source, target) always return True?

Expected Output

import pandas as pd

In [6]: pd._libs.tslibs.frequencies.is_subperiod('D', 'D')
Out[6]: True

In [7]: pd._libs.tslibs.frequencies.is_subperiod('M', 'M')
Out[7]: True

Output of pd.show_versions()

In [12]: pd.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-101-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.3 pytest: 3.1.3 pip: 9.0.1 setuptools: 37.0.0 Cython: 0.27.3 numpy: 1.13.1 scipy: 1.0.0 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None pandas_gbq: None pandas_datareader: None
gfyoung commented 6 years ago

@dtourillon : Thanks for reporting this! Indeed, that does seem intuitive to me. "Downsampling" at the same frequency should vacuously be true I would think.

@jreback @jorisvandenbossche : Thoughts?

jorisvandenbossche commented 6 years ago

cc @jbrockmendel

jbrockmendel commented 6 years ago

@jorisvandenbossche yah I took a look at this, not entirely sure. I think we need to track down a more specific explanation of the docstring "if upsampling[/downsampling] is possible between source and target frequencies"

jorisvandenbossche commented 6 years ago

It is used in downsampling (resample) and in timeseries plotting, so I would try to see there was could be the logic / see if you can find an example where it looks like a bug.

Eg in resampling, the case where both frequencies are equal is handled afterwards explicitly:

https://github.com/pandas-dev/pandas/blob/2c903d594299b2441d4742e777a10e8c76557386/pandas/core/resample.py#L867-L901

For this snipped, I would say that it the method should not return True for equal freqs (for that internal usage)