mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.59k stars 1.93k forks source link

TypeError with seaborn.kdeplot when using fill=True and categorical hue #3751

Closed mishachada closed 2 months ago

mishachada commented 3 months ago

Description

I encountered a TypeError when using seaborn.kdeplot with the fill=True option and a categorical hue in my dataset. The error message indicates that there is a problem with data types being passed to the matplotlib fill function.

Steps to Reproduce

Here is a minimal reproducible example:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Generate synthetic datasets
np.random.seed(42)

data = pd.DataFrame({
    'value': np.concatenate([np.random.normal(5, 1, size=100), np.random.normal(10, 1, size=100), np.random.normal(15, 1, size=100)]),
    'category': ['Group 1']*100 + ['Group 2']*100 + ['Group 3']*100
})

# Plot using kdeplot
sns.kdeplot(data=data, x="value", hue="category", fill=True, palette='coolwarm', alpha=0.7)
plt.show()

Expected Behavior

The KDE plot should render successfully with the filled areas for different categories.

Actual Behavior

The plot does not render, and the following error is thrown: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Suggested Fix or Improvement

It seems that kdeplot with fill=True might need better handling for different data types or clearer documentation on the expected data types. Additionally, improving error messages to guide users to the correct data type could help prevent this issue.

A temporary workaround I found was using matplotlib directly for creating ridge plots, as shown below:

# Ridge plot workaround using matplotlib
from scipy.stats import gaussian_kde

fig, ax = plt.subplots()

for category in data['category'].unique():
    subset = data[data['category'] == category]['value']
    density = gaussian_kde(subset)
    xs = np.linspace(min(subset), max(subset), 200)
    ax.fill_between(xs, density(xs), alpha=0.6, label=category)

ax.legend()
plt.show()

Additional Notes

It would be helpful to either enhance the kdeplot function to handle this more gracefully or provide a warning if the data type might cause an error.

mwaskom commented 2 months ago

I cannot replicate your issue

image

Maybe you are using an old version of seaborn.

jhncls commented 2 months ago

This could be a numpy issue. See https://github.com/mwaskom/seaborn/issues/3192

mwaskom commented 2 months ago

Closing as not reproducible, happy to reopen with an example that demonstrates an issue on the latest version.