has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
4.04k stars 218 forks source link

The Kernel appears to have died message when requesting geom_density() on large data set #332

Open raheems opened 5 years ago

raheems commented 5 years ago

Has anyone experienced kernel issue when plotting geom_density()? I have a large data set with 4 million observations and the geom_density is causing the kernel to become "dead".

geom_boxplot is working fine.

It does not give any error message or anything but the Jupyter notebook pops up a message saying "The Kernel appears to have died". And it happens only when I run the geom_density(). It takes a couple of minute or so to get the message. At that point I would have to restart the kernel.

Please let me know what sort of information would help to debug it.

raheems commented 5 years ago

I was able to reproduce the issue and the following has solved it for now.

conda install nomkl

This has upgraded some packages including numpy

import numpy as np
import pandas as pd
from plotnine import *

d1 = np.random.normal(loc=10, scale = 10, size=5000000)
d2 = np.random.randint(2, size=5000000)
d12 = pd.DataFrame({'d1': d1, 'd2': d2})

ggplot(d12) + aes(x ='d1', color= 'factor(d2)' ) + geom_density()
has2k1 commented 4 years ago

Density is computed using statsmodels.api.nonparametric.KDEUnivariate. I think that is where the problem is/was.

christiantillich commented 4 years ago

I’m also experiencing this issue. nomkl did not fix.