mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.56k stars 1.92k forks source link

[Bug] seaborn.regplot does not plot lowess nor given warning if there are too few neighbors #598

Closed carlshan closed 9 years ago

carlshan commented 9 years ago

Seaborn version: 0.5.1 iPython version: 3.0.0 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, no LOWESS curve is plotted and no error is given. Expected behavior: a warning should be raised that LOWESS cannot be plotted due to lack of sufficient unique neighbors.

(Workaround: you could jitter all the x values by some epsilon to create more unique values, and then rerun sns.regplot().)

import seaborn as sns
import numpy as np
print sns.__version__ # 0.5.1

# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)

sns.regplot(x, y, lowess=True) # no warning, only displays the scatterplot, no lowess plotted

x1 = np.random.choice(range(0, 5), 100) # 5 unique values
sns.regplot(x1, y, lowess=True) # correctly displays the lowess as well as the scatterplot
mwaskom commented 9 years ago

IMO to the extent that this is a bug, it's a bug in statsmodels. Seaborn delegates all of the lowess fit/predict logic to their functions, so the code there is in a much better position to know whether or not the data can be used with a lowess fit and respond (with a warning, exception, or otherwise) appropriately.

carlshan commented 9 years ago

Gotcha. Thanks @mwaskom. I've submitted this issue to Statsmodels: https://github.com/statsmodels/statsmodels/issues/2449