uxlfoundation / scikit-learn-intelex

Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
https://intel.github.io/scikit-learn-intelex/
Apache License 2.0
1.23k stars 176 forks source link

Segmentation fault occurs when fitting GaussianMixture model with n_components=1 #2197

Open NyankoSong opened 3 days ago

NyankoSong commented 3 days ago

Describe the bug sklearn.mixture.GaussianMixture model with n_components=1 cannot be fitted as segmentation fault occurs.

To Reproduce

import numpy as np
from sklearn.mixture import GaussianMixture

from sklearnex import patch_sklearn
patch_sklearn()

sample_points = np.random.multivariate_normal([0, 0, 0], np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]), 4000)
gmm_test = GaussianMixture(n_components=1)
gmm_test.fit(sample_points)

# Segmentation fault (core dumped)

Jupyter log

15:34:38.378 [info] Restarted 42378c22-4f85-4969-b74e-0d7b34702612
15:34:51.042 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 

Expected behavior

import numpy as np
from sklearn.mixture import GaussianMixture

# from sklearnex import patch_sklearn
# patch_sklearn()

sample_points = np.random.multivariate_normal([0, 0, 0], np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]), 4000)
gmm_test = GaussianMixture(n_components=1)
gmm_test.fit(sample_points)

# Everything works fine.

Output/Screenshots There are too few outputs to show here.

Environment:

Alexsandruss commented 1 hour ago

Issue appears to be located in one of memory allocators in Lloyd algorithm impl. of oneDAL KMeans. Segfault is observed only if n_components/n_clusters is 1, otherwise patching and computation of KMeans is successful. Temporary solution is disabling of sklearnex patching for KMeans(n_clusters=1) case in PR above.