pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

ENH: `class_column` should be optional in `pandas.plotting.parallel_coordinates` #46372

Open jack89roberts opened 2 years ago

jack89roberts commented 2 years ago

Is your feature request related to a problem?

Currently class_column is a required argument for pandas.plotting.parallel_coordinates, which means a hack/workaround is needed to create a parallel coordinates plot for a single class (i.e. with a single colour).

Describe the solution you'd like

class_column should default to None, in which case a parallel coordinates plot with a single colour will be created.

I've looked at the source code for the relevant function and I think it would be a fairly straightforward modification.

API breaking implications

I don't think this would cause any backwards-incompatible changes, the arguments to the function would stay the same and have the same order (only with class_column now being optional).

It may be preferable to allow the color argument to take a single color value when class_column is None (currently color is optional but expected to be a list of colors if defined).

Describe alternatives you've considered

It's currently possible to hack a single-coloured parallel coordinates plot by creating a dummy class_column that has a constant value, e.g.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.uniform(size=(10, 3)), columns=["a", "b", "c"])

df["label"] = -1
pd.plotting.parallel_coordinates(df, "label")
plt.gca().get_legend().remove()
plt.show()

What I'm proposing is for this to be possible instead:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.uniform(size=(10, 3)), columns=["a", "b", "c"])

pd.plotting.parallel_coordinates(df)
plt.show()

Additional context

I found someone else asking a question about this in this issue: https://github.com/pandas-dev/pandas/issues/12341#issuecomment-299911662 . The response was along the lines of class_column being required because the general use-case for parallel coordinate plots is multivariate data. That may be true, but it's valid to want to create one for a single class and I don't think the API should force the plot to have a colour scale. Cases where creating a single class plot may be useful:

jack89roberts commented 2 years ago

These functions also have a required class_column argument:

I'm not as familiar with those styles of plot but the same argument may apply (and making it optional for those should be similarly straightforward and not a breaking change).

cbbcbail commented 1 year ago

There is no reason why parallel coordinate plots must be multi-class.

brurosa commented 1 year ago

I would like to take this

peacheym commented 1 year ago

Has this issue been resolved?