mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.38k stars 1.91k forks source link

kdeplot breaks for integer data sets. #118

Closed alanhdu closed 10 years ago

alanhdu commented 10 years ago

When I run try to run seaborn.kdeplot([1, 2, 3]), I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/seaborn/distributions.py", line 713, in kdeplot
    gridsize, cut, clip, legend, ax, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/seaborn/distributions.py", line 514, in _univariate_kdeplot
    gridsize, cut, clip)
  File "/usr/local/lib/python2.7/dist-packages/seaborn/distributions.py", line 565, in _statsmodels_univariate_kde
    kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/nonparametric/kde.py", line 142, in fit
    clip=clip, cut=cut)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/nonparametric/kde.py", line 484, in kdensityfft
    binned = fast_linbin(X,a,b,gridsize)/(delta*nobs)
  File "linbin.pyx", line 17, in statsmodels.nonparametric.linbin.fast_linbin (statsmodels/nonparametric/linbin.c:1246)
ValueError: Buffer dtype mismatch, expected 'DOUBLE' but got 'long'

Works perfectly for seaborn.kdeplot([1.0, 2.0, 3.0]) though.

mwaskom commented 10 years ago

Well, could be argued that kernel density estimation isn't appropriate for discrete data (at least for relatively narrow bandwidths), but it's not a hard fix.