lux-org / lux

Automatically visualize your pandas dataframe via a single print! 📊 💡
Apache License 2.0
5.15k stars 365 forks source link

Error Describing a GroupBy #382

Closed davesgonechina closed 3 years ago

davesgonechina commented 3 years ago

https://github.com/lux-org/lux/blob/65fa234143a57adcf25414713c85deebe3f2c5cd/lux/core/groupby.py#L71

Running df.groupby("some_column")["some_other_column"].describe() in a Jupyter notebook results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-4acee8738d88> in <module>
----> 1 df.groupby("some_column")["some_other_column"].describe()

/opt/conda/lib/python3.9/site-packages/pandas/core/groupby/generic.py in describe(self, **kwargs)
    671     @doc(Series.describe)
    672     def describe(self, **kwargs):
--> 673         result = self.apply(lambda x: x.describe(**kwargs))
    674         if self.axis == 1:
    675             return result.T

/opt/conda/lib/python3.9/site-packages/lux/core/groupby.py in apply(self, *args, **kwargs)
     69 
     70     def apply(self, *args, **kwargs):
---> 71         ret_val = super(LuxDataFrameGroupBy, self).apply(*args, **kwargs)
     72         for attr in self._metadata:
     73             ret_val.__dict__[attr] = getattr(self, attr, None)

TypeError: super(type, obj): obj must be an instance or subtype of type

At first I thought this might be a Jupyter problem but increasingly I suspect it is related to how LuxSeriesGroupBy is handled?

dorisjlee commented 3 years ago

Hi @davesgonechina, Thanks for reporting this issue! This looks like an issue with how we're handling LuxSeriesGroupBy. I believe Lux handles the case of df.groupby("some_column")["some_other_column"], but it seems like there is an issue with the result being called with describe (since the return type is a Pandas dataframe). We will look into this and get back to you once we pin down the bug.

davesgonechina commented 3 years ago

Related, it appears describe() doesn't work with Lux at all?

Using df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/college.csv"), df will yield a Lux button in my Jupyter notebook, but df.describe() is just a plain Pandas data frame with no Lux button.

dorisjlee commented 3 years ago

Hi @davesgonechina, We previously disabled df.describe since the visualizations were not very meaningful.

Screen Shot 2021-06-25 at 1 51 07 PM

I've added the Lux capabilities back in df.describe for consistency. I've also fixed the bug related to groupby describe

  df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/college.csv")
  df.groupby("FundingModel")["AdmissionRate"].describe()

We will look into displaying more useful recommendations (e.g. box plots) for df.describe in the future. These changes will get merged in soon with our latest release. Let us know if this fixes the issue that you're seeing!