pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

BUG: `pandas.DataFrame.plot` results in inconsistent/incompatible `xticks` depending on the date span of the data #43972

Open trenton3983 opened 3 years ago

trenton3983 commented 3 years ago

Reproducible Example

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# sample data
dates1 = ['2021-08-26', '2021-08-27', '2021-08-30', '2021-08-31',
          '2021-09-01', '2021-09-02', '2021-09-03', '2021-09-07',
          '2021-09-08', '2021-09-09', '2021-09-10', '2021-09-13',
          '2021-09-14', '2021-09-15', '2021-09-16', '2021-09-17',
          '2021-09-20', '2021-09-21', '2021-09-22', '2021-09-23',
          '2021-09-24', '2021-09-27', '2021-09-28', '2021-09-29',
          '2021-09-30', '2021-10-01', '2021-10-04', '2021-10-05',
          '2021-10-06', '2021-10-07', '2021-10-08']

dates2 = ['2021-08-29', '2021-09-05', '2021-09-12', '2021-09-19', '2021-09-26']

np.random.seed(365)
y1 = np.random.randn(len(dates1)).cumsum()
y2 = np.random.randn(len(dates2)).cumsum()

# dataframe with more than a month span
df1 = pd.DataFrame({'date':pd.to_datetime(dates1), 'y1':y1})
df1.set_index('date', inplace=True)

# dataframe with less than a month span
df2 = pd.DataFrame({'date':pd.to_datetime(dates2), 'y2':y2})
df2.set_index('date', inplace=True)

Issue Description

fig, axs = plt.subplots(2, 2, figsize=[12, 12])
axs = axs.flat

print('Note the difference in xticks depending on the date span')
df1.plot(ax=axs[0], title='x-axis is incorrect when the dataframe with\nmore than a month of dates is plotted first')
print(f'axs[0]: {axs[0].get_xticks()}')
df2.plot(ax=axs[0], secondary_y=True)
print(f'axs[0]: {axs[0].get_xticks()}')

df2.plot(ax=axs[1], color='tab:orange', title='x-axis is correct when the dataframe with\nless than a month of dates is plotted first')
print(f'axs[1]: {axs[1].get_xticks()}')
df1.plot(ax=axs[1], color='tab:blue', secondary_y=True)
print(f'axs[1]: {axs[1].get_xticks()}')

df1.y1.plot(ax=axs[2], color='tab:blue', title='More than a month of data')
print(f'axs[2]: {axs[2].get_xticks()}')
df2.y2.plot(ax=axs[3], color='tab:orange', title='Less than a month of data')
print(f'axs[3]: {axs[3].get_xticks()}')

plt.tight_layout()

Expected Behavior

fig, axs = plt.subplots(2, 2, figsize=[20, 8], sharey=False, sharex=False)
axs = axs.flatten()

axs[0].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[0]: {axs[0].get_xticks()}')
ax4 = axs[0].twinx()
ax4.plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'ax4: {ax4.get_xticks()}')

axs[1].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[1]: {axs[1].get_xticks()}')
ax5 = axs[1].twinx()
ax5.plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'ax5: {ax5.get_xticks()}')

axs[2].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[2]: {axs[2].get_xticks()}')
axs[3].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[3]: {axs[3].get_xticks()}')

Installed Versions

INSTALLED VERSIONS ------------------ commit : 73c68257545b5f8530b7044f56647bd2db92e2ba python : 3.8.11.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.3.3 numpy : 1.20.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.0.1 setuptools : 58.0.4 Cython : 0.29.24 pytest : 6.2.4 hypothesis : None sphinx : 4.2.0 blosc : None feather : None xlsxwriter : 3.0.1 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.27.0 pandas_datareader: 0.10.0 bs4 : 4.10.0 bottleneck : 1.3.2 fsspec : 2021.08.1 fastparquet : None gcsfs : None matplotlib : 3.4.3 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : 1.4.22 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.53.1
azjps commented 10 months ago

This is the same bug as #52895 (and the other issues linked in https://github.com/pandas-dev/pandas/issues/52895#issuecomment-1859134015), the issue is actually that the pandas locators infer that dates2 is periodic and enters a different code path where it will plot the periodic time series with different units. In the second example where dates2 contains 2021-09-29, then dates2 is no longer periodic, which inadvertently avoids the issue of pandas plotting in different units.