yhat / ggpy

ggplot port for python
http://yhat.github.io/ggpy/
BSD 2-Clause "Simplified" License
3.7k stars 573 forks source link

facets with descrete values (e.g. geom_bar) does not work #196

Closed jankatins closed 8 years ago

jankatins commented 10 years ago

geom_bar uses arbitrary numbers as x axis (so making a bar plot where '3' has two values in the dataset does not mean that the bar is on "3" but it starts at 0.2). This fails when each facet has different values, they are not arranged in similar ways: 1,2,3 - 1,3,4 - 1,4,5 -> all have their bars at 0.2, 1.2, 1,3 (real x axis, not labels shown).

This means that that when the labels are removed from the subplots during faceting, you neither can see what the real labels are for each bar, nor have gaps where there are no values (second facet -> there should be a gap at '2'). It gets worse as the current faceting code removes the tick labels and reorganizes them for all facets and so they get the names of the position (which is 0.2, 1.2, 2.2, ...) and the grid is not anymore nicely under the bar.

Code to see the mess:

def _build_testing_df():
    df = pd.DataFrame({
        "x": np.arange(0, 10),
        "y": np.arange(0, 10),
        "z": np.arange(0, 10),
        "a": [1,1,1,1,1,2,2,2,3,3]
    })

    df['facets'] = np.where(df.x > 4, 'over', 'under')
    df['facets2'] = np.where((df.x % 2) == 0, 'even', 'uneven')
    return df

@cleanup
def test_facet_grid_descrete():
    df = _build_testing_df()
    gg = ggplot(aes(x='a'), data=df)
    assert_same_ggplot(gg + geom_bar() + facet_grid(x="facets", y="facets2"),
                       "faceting_grid_descrete")

@cleanup
def test_facet_wrap_descrete():
    df = _build_testing_df()
    gg = ggplot(aes(x='a'), data=df)
    assert_same_ggplot(gg + geom_bar() + facet_wrap(x="facets"), "faceting_wrap_descrete")

A a short term measure, I will add a warning at draw() time if faceting and geom_bar are used together.

Longterm I think this needs some more thought, as the current system is designed so that each facet does not know the properties of all the other facets but in this case we would need to compute the labels beforehand and use them at all individual facets. If we do that it's proably best to turn that around for all types. An Idea could be to add a new method to all geoms which would do the necessary data transforms, so faceting would be:

for geom in geoms:
   geom.do_faceting_transforms(...)
[old facets code, but pass in the faceting "hints"/transforms from above]
kevindavenport commented 10 years ago

Thanks for working on this Jan.

naught101 commented 10 years ago

Not sure if this is a separate bug or not, but facetting doesn't work with boxplots either at the moment:

gg.ggplot(gg.diamonds, gg.aes(x='color', y='price')) + gg.geom_boxplot() + gg.facet_wrap(x='cut')

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-165-78ae6c837934> in <module>()
----> 1 gg.ggplot(gg.diamonds, gg.aes(x='color', y='price')) + gg.geom_boxplot() + gg.facet_wrap(x='cut')

/usr/lib/python3/dist-packages/IPython/core/displayhook.py in __call__(self, result)
    245             self.start_displayhook()
    246             self.write_output_prompt()
--> 247             format_dict, md_dict = self.compute_format_data(result)
    248             self.write_format_data(format_dict, md_dict)
    249             self.update_user_ns(result)

/usr/lib/python3/dist-packages/IPython/core/displayhook.py in compute_format_data(self, result)
    155 
    156         """
--> 157         return self.shell.display_formatter.format(result)
    158 
    159     def write_format_data(self, format_dict, md_dict=None):

/usr/lib/python3/dist-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
    150             md = None
    151             try:
--> 152                 data = formatter(obj)
    153             except:
    154                 # FIXME: log the exception

/usr/lib/python3/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    478                 type_pprinters=self.type_printers,
    479                 deferred_pprinters=self.deferred_printers)
--> 480             printer.pretty(obj)
    481             printer.flush()
    482             return stream.getvalue()

/usr/lib/python3/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
    361                             if isinstance(meth, collections.Callable):
    362                                 return meth(obj, self, cycle)
--> 363             return _default_pprint(obj, self, cycle)
    364         finally:
    365             self.end_group()

/usr/lib/python3/dist-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    481     if getattr(klass, '__repr__', None) not in _baseclass_reprs:
    482         # A user-provided repr.
--> 483         p.text(repr(obj))
    484         return
    485     p.begin_group(1, '<')

/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/ggplot.py in __repr__(self)
    108     def __repr__(self):
    109         """Print/show the plot"""
--> 110         figure = self.draw()
    111         # We're going to default to making the plot appear when __repr__ is
    112         # called.

/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/ggplot.py in draw(self)
    275                                                     labelbottom='off')
    276                             ax = plt.gca()
--> 277                             callbacks = geom.plot_layer(frame, ax)
    278                             if callbacks:
    279                                 for callback in callbacks:

/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/geoms/geom.py in plot_layer(self, data, ax)
    134             pinfo = deepcopy(self._cache['default_aes_mpl'])
    135             pinfo.update(_data)
--> 136             self._plot_unit(pinfo, ax)
    137 
    138     def _plot_unit(self, pinfo, ax):

/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/geoms/geom_boxplot.py in _plot_unit(self, pinfo, ax)
     34             plt.setp(ax, yticklabels=l)
     35 
---> 36         q = ax.boxplot(x, vert=False)
     37         plt.setp(q['boxes'], color=color)
     38         plt.setp(q['whiskers'], color=color)

/usr/lib/python3/dist-packages/matplotlib/axes.py in boxplot(self, x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap, usermedians, conf_intervals)
   6021 
   6022             # get median and quartiles
-> 6023             q1, med, q3 = mlab.prctile(d, [25, 50, 75])
   6024 
   6025             # replace with input medians if available

/usr/lib/python3/dist-packages/matplotlib/mlab.py in prctile(x, p)
    953         frac[cond] += 1
    954 
--> 955     return _interpolate(values[ai],values[bi],frac)
    956 
    957 def prctile_rank(x, p):

/usr/lib/python3/dist-packages/matplotlib/mlab.py in _interpolate(a, b, fraction)
    927         'fraction' must be between 0 and 1.
    928         """
--> 929         return a + (b - a)*fraction
    930 
    931     scalar = True

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
jankatins commented 10 years ago

The TypeError is a different bug, but this problem applies to probably all discrete scales and facets :-(

jankatins commented 10 years ago

See also https://github.com/pydata/pandas/pull/7217 and https://github.com/pydata/pandas/issues/5313

jankatins commented 10 years ago

Note that pydata/pandas#7217 has landed in pandas, which will bring Categorical and therfore levels, but will not avaialble until fall 2014 :-( I would very much base future work in this issue on pydata/pandas#7217, but that would mean that we require a really fresh pandas version and I'm not sure how that works out for others... comments?

CC @glamp @has2k1 @yarikoptic

has2k1 commented 10 years ago

I peaked up on it as it is vital for the completeness of #283 and all that follows. Good job for your contributions over there.

We have long needed to set a minimum pandas version. Plus, based on the high bug fixing activity in pandas releases, we shouldn't be lagging so behind on the minimum version.