holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.14k stars 108 forks source link

by argument not working for hist plot #865

Open MarcSkovMadsen opened 2 years ago

MarcSkovMadsen commented 2 years ago

I'm working on updating the docstrings for the hist plot. I would expect the by argument to work similarly to how it works for other hvPlot plots and/ or similarly to how it works for Pandas plot https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hist.html.

image

But it seems to have no effect.

With by

image

import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"], by="gender")

Without by

image

import hvplot.pandas # noqa
import pandas as pd
import numpy as np
age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
df.hvplot.hist(y=["age"])
hoxbro commented 2 years ago

Looking at the code here:

https://github.com/holoviz/hvplot/blob/17ce0cc18a0393ae82d3a9f6f11b3bc4cad29b30/hvplot/converter.py#L1623-L1637

It seems to not be supported when y is a list or tuple, so making by making ´y´ a string "fixes" it. image

A small note is there is no indication of overlap: image

With alpha on: image

MarcSkovMadsen commented 2 years ago

Thanks @Hoxbro . I'm trying to move more into hvplot and understand the intention.

  1. Is it fair for me to expect it to work in this case?
  2. Is it a bug? Should it be fixed?
  3. Can I expect .hvplot to be a drop in replacement for Pandas .plot? Or is it "just" something with a similar api, where I should expect to have to adjust some of my code if I migrate from .plot to .hvplot?. What is the vision?
hoxbro commented 2 years ago
  1. I think it is fair to work in this case or raise an error that by does not work when y is a list/tuple.
  2. See the answer above.
  3. In theory and in general, I would say yes. But in practice, it is not always possible. The pandas.plot API is a moving target and could change. Furthermore, consistency between different plot kinds could overrule how pandas do it. hvplot also gives more default values than pandas do.

For this example, it is harder to get the desired output. But try to make a scatter plot with your DataFrame with the index on the x-axis and age on the y-axis. With hvplot, it is as easy as df.hvplot.scatter(y="age"). It is a bit harder to get it working with pandas.plot I could get it to work with df.reset_index().plot.scatter(x="index", y="age").

If we then want to color the scatter based on gender, we can do it with: df.hvplot.scatter(y="age", c="gender") or df.hvplot.scatter(y="age", by="gender"). I couldn't find an easy way to do this with pandas.plot (though there properly is a way to do it).


For the original question you can get the same output as pandas by using subplots: df.hvplot.hist(y="age", by="gender", subplots=True).cols(1).

image