has2k1 / scikit-misc

Miscellaneous tools for data analysis and scientific computing
https://has2k1.github.io/scikit-misc/stable
BSD 3-Clause "New" or "Revised" License
39 stars 9 forks source link

different loess preds on different systems #8

Closed adamgayoso closed 4 years ago

adamgayoso commented 4 years ago

Linux:

Screen Shot 2020-05-06 at 1 56 50 PM

Mac:

Screen Shot 2020-05-06 at 1 57 09 PM

I can give more architecture details but maybe there's something I'm missing? The input arrays are have about 16,000 samples.

def _loess(y, x, span=0.3):

    from skmisc.loess import loess

    model = loess(x, y, span=span, degree=2)
    model.fit()
    y_est = model.predict(x).values

    return y_est
has2k1 commented 4 years ago

Yes please, more details.

While the continuous integration does not run test on OSX, the wheels for the package are built only if the tests pass, and they have always passed for OSX. So the difference in results is strange.

Are you sure the input arrays that make it to the _loess function are identical across both systems? Also what is the output of skmisc.show_config() ?

adamgayoso commented 4 years ago

Linux

In [86]: skmisc.show_config()                                                                                                                                               
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

Mac:

In [45]: skmisc.show_config()                                                                                                                                               
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
accelerate_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]

I'm quite sure the inputs are exactly the same. Indeed, only 4 entries are relatively different from the output. I could explain more but the travis test in the linked issue is comparing to the inputs and outputs of the same data using an algorithm that uses R's loess.

adamgayoso commented 4 years ago

Ah, weird, I just checked and it's literally the first 4 samples that have different outputs.

Linux: Screen Shot 2020-05-06 at 3 52 35 PM

Mac:

Screen Shot 2020-05-06 at 3 52 21 PM

adamgayoso commented 4 years ago

Linux/mac x/y Screen Shot 2020-05-06 at 3 54 56 PM Screen Shot 2020-05-06 at 3 55 08 PM

I can provide the x and y arrays if that's helpful. Also thanks so much for implementing this in Python!

has2k1 commented 4 years ago

I can provide the x and y arrays if that's helpful.

That would be helpful. Also, as I do not have access to a Mac, instead of passing along the original two arrays can you first try bisecting the arrays to the smallest size that still yields different results.

adamgayoso commented 4 years ago

Will do, I also just discovered if I prepend with 4 random values, the results are consistent.

adamgayoso commented 4 years ago

Linux output

In [117]: _loess(y_prime, x_prime, span=0.3)                                                                                                                                
Out[117]: 
array([ 2.08802498e-316,  2.85448522e-316, -2.73303739e+000,
       -2.95456134e+000, -3.13049392e+000, -2.00491824e+000,
       -3.13049392e+000, -3.43136381e+000, -7.27399489e-001,
       -2.47937146e+000, -2.58722517e+000, -2.95456134e+000,
       -8.56620272e-001, -2.82978696e+000,  8.30198094e-001,
       -2.52939849e+000, -1.84931491e+000, -1.34903194e+000,
       -8.66516453e-001, -5.96220168e-001, -1.62071137e+000,
       -2.95456134e+000, -3.43136381e+000, -6.68135098e-001,
       -3.43136381e+000, -1.59848311e+000, -1.60567490e+000,
       -7.79845871e-001, -1.51529033e+000, -3.43136381e+000,
       -2.73303739e+000, -2.73303739e+000, -1.07865078e-001,
       -1.20813773e+000, -1.38914177e+000, -1.64693281e-001,
       -3.13049392e+000, -1.82434376e+000, -1.79973731e+000,
       -1.72356384e+000,  1.43946572e-002, -1.79079145e+000,
       -2.47937146e+000, -1.77632140e+000, -9.19561257e-001,
       -1.92401772e+000, -1.60567490e+000, -3.13049392e+000,
       -1.76611749e+000, -1.05588792e+000])

Mac output

In [57]: _loess(y_prime, x_prime, span=0.3)                                                                                                                                 
Out[57]: 
array([ 1.49457748e-154, -2.00000000e+000, -2.73303739e+000,
       -2.95456134e+000, -3.13049392e+000, -2.00491824e+000,
       -3.13049392e+000, -3.43136381e+000, -7.27399489e-001,
       -2.47937146e+000, -2.58722517e+000, -2.95456134e+000,
       -8.56620272e-001, -2.82978696e+000,  8.30198094e-001,
       -2.52939849e+000, -1.84931491e+000, -1.34903194e+000,
       -8.66516453e-001, -5.96220168e-001, -1.62071137e+000,
       -2.95456134e+000, -3.43136381e+000, -6.68135098e-001,
       -3.43136381e+000, -1.59848311e+000, -1.60567490e+000,
       -7.79845871e-001, -1.51529033e+000, -3.43136381e+000,
       -2.73303739e+000, -2.73303739e+000, -1.07865078e-001,
       -1.20813773e+000, -1.38914177e+000, -1.64693281e-001,
       -3.13049392e+000, -1.82434376e+000, -1.79973731e+000,
       -1.72356384e+000,  1.43946572e-002, -1.79079145e+000,
       -2.47937146e+000, -1.77632140e+000, -9.19561257e-001,
       -1.92401772e+000, -1.60567490e+000, -3.13049392e+000,
       -1.76611749e+000, -1.05588792e+000])

Inputs:

In [58]: x_prime                                                                                                                                                            
Out[58]: 
array([-2.47712125, -2.95424251, -2.73239376, -2.95424251, -3.13033377,
       -2.17609126, -3.13033377, -3.43136376, -0.97197128, -2.47712125,
       -2.58626572, -2.95424251, -1.10298416, -2.82930377,  0.07756976,
       -2.52827378, -2.05115252, -1.38214574, -1.11121748, -0.85387196,
       -1.63897207, -2.95424251, -3.43136376, -0.92081875, -3.43136376,
       -1.61845041, -1.62518379, -1.0231238 , -1.53926916, -3.43136376,
       -2.73239376, -2.73239376, -0.43310443, -1.25527251, -1.42276359,
       -0.47136893, -3.13033377, -2.03342376, -1.98420573, -1.72379359,
       -0.35472332, -1.96896577, -2.47712125, -1.85158017, -1.14132915,
       -2.10914447, -1.62518379, -3.13033377, -1.76860593, -1.19331766])

In [59]: y_prime                                                                                                                                                            
Out[59]: 
array([-2.47841044, -2.95456445, -2.73303787, -2.95456445, -3.13049471,
       -2.17883537, -3.13049471, -3.43136376, -0.80049032, -2.47841044,
       -2.58723226, -2.95456445, -0.83769899, -2.82978677,  0.84834422,
       -2.52940161, -2.0548693 , -1.23432489, -0.79089853, -0.80478039,
       -1.59502886, -2.95456445, -3.43136376, -0.89323276, -3.43136376,
       -1.60231633, -1.60849252, -0.17071949, -1.54052111, -3.43136376,
       -2.73303787, -2.73303787, -0.2696592 , -1.24462321, -1.42197869,
        0.19327998, -3.13049471, -0.95878506, -1.98857216, -1.69851433,
       -0.14432653, -1.94422696, -2.47841044, -1.83498874, -1.09092406,
       -2.11237464, -1.60849252, -3.13049471, -1.73909396, -1.17613506])
has2k1 commented 4 years ago

Okay, thank you I will look at it. But from what you have posted I do not think it is a Mac vs Linux different output. Both of these first two values look suspicious i.e very low or exactly 2!

linux
2.08802498e-316,  2.85448522e-316,

mac
1.49457748e-154, -2.00000000e+000,

I guess if you rerun the inputs you will get different results, and if they the same the restarting the session should definitely yield different results!

has2k1 commented 4 years ago

I have fixed this and put out a new release. The issue was a memory bug, that showed up depending on the how the library was used. Without the fix, it could be avoided with yest = model.predict(x).values.copy()