Examples failing, or Not Compatible with Windows

Jenders74 commented 11 years ago

Both of the usage examples result in the same error for me..

import numpy import cProfile from pyearth import Earth from matplotlib import pyplot

numpy.random.seed(2) m = 1000 n = 10 X = 80_numpy.random.uniform(size=(m,n)) - 40 y = numpy.abs(X[:,6] - 4.0) + 1_numpy.random.normal(size=m) model = Earth(max_degree = 1) model.fit(X,y) Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\pyearth\earth.py", line 312, in fit self.forward_pass(X, y) File "C:\Python27\lib\site-packages\pyearth\earth.py", line 383, in forward_pass forward_passer = ForwardPasser(X, y, **args) File "_forward.pyx", line 67, in pyearth._forward.ForwardPasser.init (pyearth/_forward.c:3146) File "_forward.pyx", line 96, in pyearth._forward.ForwardPasser.init_linear_variables (pyearth/_forward.c:3698) ValueError: Buffer dtype mismatch, expected 'INT_t' but got 'long long'

jcrudy commented 11 years ago

I am guessing the problem is that numpy on your system uses a different data type for integers than the numpy on my system (and, so far, all other users' systems). Would you mind sending me your system information including:

operating system (and is it 32 or 64 bit)
python version (and is it 32 or 64 bit)
numpy version
anything else you think is relevant (such as fancy BLAS implementations or something, if you're aware of them)

I am no expert in the numpy data types, so I will have to do some research. I would like to come up with a solution that works on all systems.

Right now, you may be able to get it to work by doing the following:

delete your current py-earth installation
install cython
reinstall py-earth with the --cythonize argument: sudo python setup.py install --cythonize
try the example again

If you try that, would you please let me know if it works or not? Thanks so much for reporting this. I had suspected that numpy data types might be a problem on some systems, but wasn't really sure. Now that I have an example of the problem I can work on a solution.

jcrudy commented 11 years ago

After some research, I made a change that I think might fix your problem. Unfortunately I can't test it because it has always worked on all of my systems (mac and 64 bit linux). I believe this problem should be fixed by commit ae663871879b28f453ccde189c06194c5f540981, so I am going to close this issue for now. If you test it out and still have the same problem, please reopen it. Thanks!

Jenders74 commented 11 years ago

Thank you for making the changes. Tried uninstalling the module and re-installing the new master. Also, followed the cython instructions. Totally possible that I messed up an uninstall/re-install but I get the same dtype issue. on my 64 bit Windows 7 machine with Python 2.7.3 -- EPD 7.3-2 (64-bit) and 1.6.1 numpy.

jcrudy commented 11 years ago

I have never tested on Windows, so I'm guessing that's the problem. I'll try to get my hands on a Windows machine this weekend and do some tests. Until then, I'm afraid the best advice I can give you is to either use the earth package in R or use Linux. I'll report here as soon as I've figured out the problem.

BTW, I currently use travis for automated testing. If anyone who reads this knows of a similar service capable of testing in a Windows environment, could you please comment?

rkern commented 11 years ago

The code that is failing is assigning the result of np.argsort() to an INT_t array. np.argsort() returns an array with the type cnp.intp_t (in Cython terms), which corresponds to C's ssize_t, the size of a pointer offset. On 64-bit Windows, this is 64-bits, naturally. However, the size of a cnp.int_t or INT_t (which corresponds to a C long) is still 32-bits (unlike 64-bit Linux and OS X which promote long to 64-bits).

The order array in that method needs to be declared with a type cnp.intp_t instead.

rkern commented 11 years ago

I can confirm that changing INT_t to cnp.intp_t in the declaration of order lets me run both examples.

There are a few other things that need to fixed to get it to compile using Visual Studio on Windows. Namely, Visual Studio is not a C99 compiler, so log2() is not implemented and must be emulated. Also, Visual Studio errors out at compile time at the expression 0 / 0. I just replaced that with np.nan.

jcrudy commented 11 years ago

@rkern, thanks for chiming in! You just saved me at least an hour of dtype guess and check plus a 90 minute drive to my parents' house to use their computer (I should still visit soon, though). I made the changes you suggested and, barring unexpected things, this issue should be repaired as of commit c5385b0f1d736bcf6de75a87dd92c40923bbc015.

I'm closing this issue for now. @Jenders74, if you still have the problem please comment and reopen.