Closed jankatins closed 8 years ago
Thanks for working on this Jan.
Not sure if this is a separate bug or not, but facetting doesn't work with boxplots either at the moment:
gg.ggplot(gg.diamonds, gg.aes(x='color', y='price')) + gg.geom_boxplot() + gg.facet_wrap(x='cut')
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-165-78ae6c837934> in <module>()
----> 1 gg.ggplot(gg.diamonds, gg.aes(x='color', y='price')) + gg.geom_boxplot() + gg.facet_wrap(x='cut')
/usr/lib/python3/dist-packages/IPython/core/displayhook.py in __call__(self, result)
245 self.start_displayhook()
246 self.write_output_prompt()
--> 247 format_dict, md_dict = self.compute_format_data(result)
248 self.write_format_data(format_dict, md_dict)
249 self.update_user_ns(result)
/usr/lib/python3/dist-packages/IPython/core/displayhook.py in compute_format_data(self, result)
155
156 """
--> 157 return self.shell.display_formatter.format(result)
158
159 def write_format_data(self, format_dict, md_dict=None):
/usr/lib/python3/dist-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
150 md = None
151 try:
--> 152 data = formatter(obj)
153 except:
154 # FIXME: log the exception
/usr/lib/python3/dist-packages/IPython/core/formatters.py in __call__(self, obj)
478 type_pprinters=self.type_printers,
479 deferred_pprinters=self.deferred_printers)
--> 480 printer.pretty(obj)
481 printer.flush()
482 return stream.getvalue()
/usr/lib/python3/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
361 if isinstance(meth, collections.Callable):
362 return meth(obj, self, cycle)
--> 363 return _default_pprint(obj, self, cycle)
364 finally:
365 self.end_group()
/usr/lib/python3/dist-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
481 if getattr(klass, '__repr__', None) not in _baseclass_reprs:
482 # A user-provided repr.
--> 483 p.text(repr(obj))
484 return
485 p.begin_group(1, '<')
/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/ggplot.py in __repr__(self)
108 def __repr__(self):
109 """Print/show the plot"""
--> 110 figure = self.draw()
111 # We're going to default to making the plot appear when __repr__ is
112 # called.
/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/ggplot.py in draw(self)
275 labelbottom='off')
276 ax = plt.gca()
--> 277 callbacks = geom.plot_layer(frame, ax)
278 if callbacks:
279 for callback in callbacks:
/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/geoms/geom.py in plot_layer(self, data, ax)
134 pinfo = deepcopy(self._cache['default_aes_mpl'])
135 pinfo.update(_data)
--> 136 self._plot_unit(pinfo, ax)
137
138 def _plot_unit(self, pinfo, ax):
/usr/local/lib/python3.4/dist-packages/ggplot-0.5.9-py3.4.egg/ggplot/geoms/geom_boxplot.py in _plot_unit(self, pinfo, ax)
34 plt.setp(ax, yticklabels=l)
35
---> 36 q = ax.boxplot(x, vert=False)
37 plt.setp(q['boxes'], color=color)
38 plt.setp(q['whiskers'], color=color)
/usr/lib/python3/dist-packages/matplotlib/axes.py in boxplot(self, x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap, usermedians, conf_intervals)
6021
6022 # get median and quartiles
-> 6023 q1, med, q3 = mlab.prctile(d, [25, 50, 75])
6024
6025 # replace with input medians if available
/usr/lib/python3/dist-packages/matplotlib/mlab.py in prctile(x, p)
953 frac[cond] += 1
954
--> 955 return _interpolate(values[ai],values[bi],frac)
956
957 def prctile_rank(x, p):
/usr/lib/python3/dist-packages/matplotlib/mlab.py in _interpolate(a, b, fraction)
927 'fraction' must be between 0 and 1.
928 """
--> 929 return a + (b - a)*fraction
930
931 scalar = True
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
The TypeError is a different bug, but this problem applies to probably all discrete scales and facets :-(
Note that pydata/pandas#7217 has landed in pandas, which will bring Categorical
and therfore levels
, but will not avaialble until fall 2014 :-( I would very much base future work in this issue on pydata/pandas#7217, but that would mean that we require a really fresh pandas version and I'm not sure how that works out for others... comments?
CC @glamp @has2k1 @yarikoptic
I peaked up on it as it is vital for the completeness of #283 and all that follows. Good job for your contributions over there.
We have long needed to set a minimum pandas version. Plus, based on the high bug fixing activity in pandas releases, we shouldn't be lagging so behind on the minimum version.
geom_bar uses arbitrary numbers as x axis (so making a bar plot where '3' has two values in the dataset does not mean that the bar is on "3" but it starts at 0.2). This fails when each facet has different values, they are not arranged in similar ways: 1,2,3 - 1,3,4 - 1,4,5 -> all have their bars at 0.2, 1.2, 1,3 (real x axis, not labels shown).
This means that that when the labels are removed from the subplots during faceting, you neither can see what the real labels are for each bar, nor have gaps where there are no values (second facet -> there should be a gap at '2'). It gets worse as the current faceting code removes the tick labels and reorganizes them for all facets and so they get the names of the position (which is 0.2, 1.2, 2.2, ...) and the grid is not anymore nicely under the bar.
Code to see the mess:
A a short term measure, I will add a warning at
draw()
time if faceting and geom_bar are used together.Longterm I think this needs some more thought, as the current system is designed so that each facet does not know the properties of all the other facets but in this case we would need to compute the labels beforehand and use them at all individual facets. If we do that it's proably best to turn that around for all types. An Idea could be to add a new method to all geoms which would do the necessary data transforms, so faceting would be: