hlin117 / mdlp-discretization

An implementation of the minimum description length principal expert binning algorithm by Usama Fayyad
BSD 3-Clause "New" or "Revised" License
101 stars 53 forks source link

Buffer dtype mismatch, expect 'int64_t' but got 'long' #21

Closed parafac closed 6 years ago

parafac commented 6 years ago

Hello Henry,

I downloaded your mdlp code and managed to compiled it with Visual Studio 2015 on Windows 7. The code passed the compile and build. But when I tried the iris data set by following your instruction, I got dtype mismatch problem. See the output below. Do you have any suggestion?

Thanks, William

from discretization import MDLP

from sklearn.datasets import load_iris

iris = load_iris()

X=iris.data

y=iris.target

mdlp = MDLP()

conv_X = mdlp.fit_transform(X, y)

Traceback (most recent call last):

File "", line 1, in conv_X = mdlp.fit_transform(X, y)

File "C:\Users\es036b\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\base.py", line 458, in fit_transform return self.fit(X, y, **fit_params).transform(X)

File "C:\Users\es036b\Documents\Code\mdlp-discretization-master\discretization.py", line 142, in fit cut_points = MDLPDiscretize(col, y, self.min_depth)

File "_mdlp.pyx", line 40, in _mdlp.MDLPDiscretize (_mdlp.cpp:1942) k = find_cut(y, start, end)

File "_mdlp.pyx", line 106, in _mdlp.find_cut (_mdlp.cpp:3412) def find_cut(np.ndarray[np.int64_t, ndim=1] y, int start, int end):

ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long'

hlin117 commented 6 years ago

Unfortunately I don't have a windows computer, so I can't help much here... if someone has had success running this library on Windows and can assist, that would be great.

bacalfa commented 6 years ago

I also got this error. Installed it via pip (mdlp-0.32).

bacalfa commented 6 years ago

Does this discussion help?

bacalfa commented 6 years ago

I think I found the culprit. This line,

    y = check_array(y, ensure_2d=False, dtype=int)

casts y to int32 on Windows (don't know what happens in other platforms). Changing it to the following,

    y = check_array(y, ensure_2d=False, dtype=np.int64)

doesn't result in that type error. Can you check if this "fix" is indeed a fix across platforms, and if it is, push a new version to pip? :)

parafac commented 6 years ago

Thank you bacalfa, I'll give it a try and post my result.

parafac commented 6 years ago

With Bruno's (bacalfa) fix, everything went through without any error. I also tested and compared the iris example using shuffle=False and got the same output as listed in the code in the test folder.

Thank you so much for helping.

hlin117 commented 6 years ago

Thanks for reporting the bug and how to fix it!