pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.43k stars 17.85k forks source link

BUG: Automatic change of color when the plot type is "line" but not when it is "scatter" #59846

Open TinoDerb opened 2 weeks ago

TinoDerb commented 2 weeks ago

Pandas version checks

Reproducible Example

import pandas as pd
import matplotlib.pyplot as plt
# I am using Jupyter. and my backend is notebook, for this, I use %matplotlib notebook
data = {'y': [1,2,3,4,5,6,7,8,9,10],
        'x': [1,2,3,4,5,6,7,8,9,10],
        'layer' : ['a','a','a','b','b','b','c','c','c','c']}
df = pd.DataFrame(data)
plt.figure()
df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='line') # this changes the color automatically

plt.figure()
df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='scatter') # this does not

Issue Description

I was trying to write an asnwer on stackoverflow and I noticed the following behaviour:

Using df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='line') changes the color of every new group automatically.

line

This is however not seen when using kind="scatter".

scatter

I think my version is a bit older than the latest one, but I could not find any similar issue on github.

Expected Behavior

I expect that the color should also change for the scatter kind. For this, here is the output of the following code:

plt.figure()
for layer in df['layer'].unique(): # group by layer
    subDf = df[df['layer'] == layer] # get subset of dataframe
    plt.scatter(subDf['x'], subDf['y'], label=layer) # plot
# add labels and such
plt.xlabel('x')
plt.ylabel('y')

Unbenannt

Installed Versions

commit : 0f437949513225922d851e9581723d82120684a6 python : 3.11.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.0.3 numpy : 1.24.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.0.0 pip : 23.2.1 Cython : 3.0.6 pytest : 7.4.0 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.15.0 pandas_datareader: None bs4 : 4.12.2 bottleneck : 1.3.5 brotli : fastparquet : None fsspec : 2023.4.0 gcsfs : None matplotlib : 3.7.2 numba : 0.57.1 numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : 2023.4.0 scipy : 1.11.1 snappy : sqlalchemy : 1.4.39 tables : 3.8.0 tabulate : 0.8.10 xarray : 2023.6.0 xlrd : None zstandard : 0.19.0 tzdata : 2023.3 qtpy : 2.2.0 pyqt5 : None
abhisin-07 commented 2 weeks ago

df.groupby("layer").plot(x='x', y='y', ax=plt.gca(), kind='line', color='blue'). DEFINE COLOR EXPICITLY BY PASSING COLOR PARAMETER IN THE FUNCTION PLOT BECAUSE WHEN WE USE groupby(layer) it divides the plot data in groups and uses default matplotlib color cycle to different groups sets

TinoDerb commented 2 weeks ago

@abhisin-07 Hey, it seems you didn't really read the issue. The desired results is to have the automatic matplotlib color cycle for kind='scatter', just like when plotting with kind='line'.

P.S.: no need for all caps :)

yuanx749 commented 1 week ago

Hmm, with the line plot, even the color is correct, the legend is problematic.