pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.42k stars 17.85k forks source link

BUG: Plotting bug in Pandas 2.2.2 and 2.2.3 #59960

Open jlchang opened 12 hours ago

jlchang commented 12 hours ago

Pandas version checks

Reproducible Example

wget https://github.com/jlchang/cb-python-intro-lesson-template/raw/refs/heads/main/episodes/files/data.zip
unzip data.zip

import pandas as pd
df_long = pd.read_pickle('data/df_long.pkl')
albany = df_long[df_long['branch'] == 'Albany Park']
albany.plot()

Issue Description

FYI, Pandas 2.2.2 seems to have a plotting bug (this does not seem to be specific to Colab). For this tutorial, running albany['circulation'].plot() renders:

Screenshot 2024-10-04 at 5 12 09 AM

Expected Behavior

The expected plot looks like:

Screenshot 2024-10-04 at 5 14 11 AM

Pandas 2.0.3 generates the expected plot (Pandas 2.2.3 is also problematic)

Installed Versions

Python 3.10.9 on Mac M1 Max running Sonoma 14.6.1 (23G93)

>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.10.9 python-bits : 64 OS : Darwin OS-release : 23.6.0 Version : Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 1.23.5 pytz : 2022.7.1 dateutil : 2.8.2 pip : 24.0 Cython : None sphinx : None IPython : 8.11.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.11.2 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.2 lxml.etree : None matplotlib : 3.6.3 numba : 0.56.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.10.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : N/A pyqt5 : None

The same issue happens in Google Colab which is running Python 3.10.12

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.10.12 python-bits : 64 OS : Linux OS-release : 6.1.85+ Version : #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 1.26.4 pytz : 2024.2 dateutil : 2.8.2 pip : 24.1.2 Cython : 3.0.11 sphinx : 5.0.2 IPython : 7.34.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : 1.4.0 dataframe-api-compat : None fastparquet : None fsspec : 2024.6.1 html5lib : 1.1 hypothesis : None gcsfs : 2024.6.1 jinja2 : 3.1.4 lxml.etree : 4.9.4 matplotlib : 3.7.1 numba : 0.60.0 numexpr : 2.10.1 odfpy : None openpyxl : 3.1.5 pandas_gbq : 0.23.2 psycopg2 : 2.9.9 pymysql : None pyarrow : 16.1.0 pyreadstat : None pytest : 7.4.4 python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : 2.0.35 tables : 3.8.0 tabulate : 0.9.0 xarray : 2024.9.0 xlrd : 2.0.1 xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None
asishm commented 11 hours ago

Thanks for the report, confirmed in main.

as a workaround, you can run albany['circulation'].sort_index().plot() to get the desired output.

Seems to stem from #55906 where the sorting of the index was removed.

cc @jbrockmendel

asishm commented 11 hours ago

simpler reproducer: (slightly modifed the test at pandas/tests/plotting/frame/test_frame.py:test_unordered_ts

from datetime import date
import pandas as pd
import numpy as np

index = [date(2012, 10, 1), date(2012, 8, 1), date(2012, 9, 1)]
values = [3.0, 2.0, 1.0]
df = pd.DataFrame(
    np.array(values),
    index=index,
    columns=["test"],
)

df.plot()

v2.0.3 image

main image

jlchang commented 10 hours ago

Thank you for the workaround. And thank you to the pandas team for an incredible library. It is a vital to my daily work.