yhat / ggpy

ggplot port for python
http://yhat.github.io/ggpy/
BSD 2-Clause "Simplified" License
3.7k stars 573 forks source link

stat_smooth issues #70

Closed johncollins closed 10 years ago

johncollins commented 10 years ago

First off, I love what you're doing. I've had this on a to-do list for a long time. I would love to contribute.

Ran into some initial issues. Running everything on a macbook. Tried these in the python, ipython and ipython-notebook environments; all running python 2.7.

Trying to recreate something like the first example here: http://docs.ggplot2.org/0.9.3.1/stat_smooth.html

from ggplot import *
c = ggplot(mtcars, aes('qsec', 'wt'))
c + stat_smooth(se=True)

I got the following:

output_stat_smooth_issue

Another stat_smooth issue with one of your own examples I could not replicate from the blog post.

from ggplot import *
import pandas as pd
meat_lng = pd.melt(meat, id_vars=['date'])
p = ggplot(aes(x='date', y='value'), data=meat_lng)

gave me the following:

output_image_stat_smooth

Any ideas?

johncollins commented 10 years ago

Left out the last line of code in the second example. Here it is:

p + geom_point() + stat_smooth(colour="red")
jhaynes commented 10 years ago

Do you get an error when you call stat_smooth? I get the same result in your second example with the traceback below. All the imports in components.smoothers succeed.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-32-e934d91c1008> in <module>()
      2 meat_lng = pd.melt(meat, id_vars=['date'])
      3 p = ggplot(aes(x='date', y='value'), data=meat_lng)
----> 4 p + geom_point() + stat_smooth(colour="red")

/software/ipython/IPython/core/displayhook.pyc in __call__(self, result)
    245             self.start_displayhook()
    246             self.write_output_prompt()
--> 247             format_dict, md_dict = self.compute_format_data(result)
    248             self.write_format_data(format_dict, md_dict)
    249             self.update_user_ns(result)

/software/ipython/IPython/core/displayhook.pyc in compute_format_data(self, result)
    155 
    156         """
--> 157         return self.shell.display_formatter.format(result)
    158 
    159     def write_format_data(self, format_dict, md_dict=None):

/software/ipython/IPython/core/formatters.pyc in format(self, obj, include, exclude)
    150             md = None
    151             try:
--> 152                 data = formatter(obj)
    153             except:
    154                 # FIXME: log the exception

/software/ipython/IPython/core/formatters.pyc in __call__(self, obj)
    479                 type_pprinters=self.type_printers,
    480                 deferred_pprinters=self.deferred_printers)
--> 481             printer.pretty(obj)
    482             printer.flush()
    483             return stream.getvalue()

/software/ipython/IPython/lib/pretty.pyc in pretty(self, obj)
    369                             if callable(meth):
    370                                 return meth(obj, self, cycle)
--> 371             return _default_pprint(obj, self, cycle)
    372         finally:
    373             self.end_group()

/software/ipython/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
    489     if getattr(klass, '__repr__', None) not in _baseclass_reprs:
    490         # A user-provided repr. Find newlines and replace them with p.break_()
--> 491         output = repr(obj)
    492         for idx,output_line in enumerate(output.splitlines()):
    493             if idx:

/software/ggplot/ggplot/ggplot.py in __repr__(self)
    209                 for geom in self.geoms:
    210                     plt.subplot(1, 1, 1)
--> 211                     callbacks = geom.plot_layer(layer)
    212                     if callbacks:
    213                         for callback in callbacks:

/software/ggplot/ggplot/geoms/stat_smooth.pyc in plot_layer(self, layer)
     38             y, y1, y2 = smoothers.mavg(x, y)
     39         else:
---> 40             y, y1, y2 = smoothers.lowess(x, y, span=span)
     41         idx = np.argsort(x)
     42         x = np.array(x)[idx]

/software/ggplot/ggplot/components/smoothers.pyc in lowess(x, y, span)
     48     if _isdate(x[0]):
     49         x = np.array([i.toordinal() for i in x])
---> 50     result = smlowess(np.array(y), np.array(x), frac=span)
     51     x = pd.Series(result[::,0])
     52     y = pd.Series(result[::,1])

/software/statsmodels/statsmodels/nonparametric/smoothers_lowess.pyc in lowess(endog, exog, frac, it, delta, is_sorted, missing, return_sorted)
    163         y = np.array(y[sort_index])
    164 
--> 165     res = _lowess(y, x, frac=frac, it=it, delta=delta)
    166     _, yfitted = res.T
    167 

/software/statsmodels/statsmodels/nonparametric/_smoothers_lowess.so in statsmodels.nonparametric._smoothers_lowess.lowess (statsmodels/nonparametric/_smoothers_lowess.c:2363)()

/software/statsmodels/statsmodels/nonparametric/_smoothers_lowess.so in statsmodels.nonparametric._smoothers_lowess.calculate_residual_weights (statsmodels/nonparametric/_smoothers_lowess.c:4040)()

/software/numpy/numpy/lib/function_base.pyc in median(a, axis, out, overwrite_input)
   2797             part = partition(a, ((sz // 2) - 1, sz // 2), axis=axis)
   2798         else:
-> 2799             part = partition(a, (sz - 1) // 2, axis=axis)
   2800     if part.shape == ():
   2801         # make 0-D arrays work

/software/numpy/numpy/core/fromnumeric.pyc in partition(a, kth, axis, kind, order)
    617     else:
    618         a = asanyarray(a).copy()
--> 619     a.partition(kth, axis=axis, kind=kind, order=order)
    620     return a
    621 

AttributeError: 'numpy.ndarray' object has no attribute 'partition'
johncollins commented 10 years ago

Nope, no error printed. Different output on my windows machine at work for the second plot with the stat_smooth being printed this time, albeit with crazy width / bounding. new_figure

johncollins commented 10 years ago

I see now that, since the second example was a faceting one and I did not select 'beef' or some other option for meat in a single plot, I was seeing the error above. The following code does what we'd want:

from ggplot import *
import pandas as pd
meat_lng = pd.melt(meat, id_vars=['date'])
p = ggplot(aes(x='date', y='value'), data=meat_lng[meat_lng['variable']=='beef'])
print(p + geom_point() + stat_smooth(colour="red", se=True))
plt.show(1)

My bad for not correctly subsetting. ggplot2 actually gives a similar kind of weird plot as those above.

I guess there is still one issue here:

In the first graph, the default settings do not allow a smooth enough curve to be plotted. Should this be a bug?

glamp commented 10 years ago

Yeah there's definitely an issue. We're currently working on a better implementation of the se bands.

Yep. Definitely a bug. Thanks for investigating. I'll let you know when the fix is live.

On Wed, Oct 23, 2013 at 6:55 PM, John Collins notifications@github.comwrote:

I see now that, since the second example was a faceting one and I did not select 'beef' or some other option for meat in a single plot, I was seeing the error above. The following code does what we'd want:

from ggplot import *import pandas as pdmeat_lng = pd.melt(meat, id_vars=['date']) p = ggplot(aes(x='date', y='value'), data=meat_lng[meat_lng['variable']=='beef'])print(p + geom_point() + stat_smooth(colour="red", se=True))plt.show(1)

My bad for not correctly subsetting. ggplot2 actually gives a similar kind of weird plot as those above.

I guess there is still one issue here:

In the first graph, the default settings do not allow a smooth enough curve to be plotted. Should this be a bug?

— Reply to this email directly or view it on GitHubhttps://github.com/yhat/ggplot/issues/70#issuecomment-26953383 .

glamp commented 10 years ago

seems to be fixed screen shot 2013-12-22 at 11 37 50 am