Closed adamgayoso closed 4 years ago
Yes please, more details.
While the continuous integration does not run test on OSX, the wheels for the package are built only if the tests pass, and they have always passed for OSX. So the difference in results is strange.
Are you sure the input arrays that make it to the _loess
function are identical across both systems?
Also what is the output of skmisc.show_config()
?
Linux
In [86]: skmisc.show_config()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
Mac:
In [45]: skmisc.show_config()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
accelerate_info:
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
blas_opt_info:
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
I'm quite sure the inputs are exactly the same. Indeed, only 4 entries are relatively different from the output. I could explain more but the travis test in the linked issue is comparing to the inputs and outputs of the same data using an algorithm that uses R's loess.
Ah, weird, I just checked and it's literally the first 4 samples that have different outputs.
Linux:
Mac:
Linux/mac x/y
I can provide the x and y arrays if that's helpful. Also thanks so much for implementing this in Python!
I can provide the x and y arrays if that's helpful.
That would be helpful. Also, as I do not have access to a Mac, instead of passing along the original two arrays can you first try bisecting the arrays to the smallest size that still yields different results.
Will do, I also just discovered if I prepend with 4 random values, the results are consistent.
Linux output
In [117]: _loess(y_prime, x_prime, span=0.3)
Out[117]:
array([ 2.08802498e-316, 2.85448522e-316, -2.73303739e+000,
-2.95456134e+000, -3.13049392e+000, -2.00491824e+000,
-3.13049392e+000, -3.43136381e+000, -7.27399489e-001,
-2.47937146e+000, -2.58722517e+000, -2.95456134e+000,
-8.56620272e-001, -2.82978696e+000, 8.30198094e-001,
-2.52939849e+000, -1.84931491e+000, -1.34903194e+000,
-8.66516453e-001, -5.96220168e-001, -1.62071137e+000,
-2.95456134e+000, -3.43136381e+000, -6.68135098e-001,
-3.43136381e+000, -1.59848311e+000, -1.60567490e+000,
-7.79845871e-001, -1.51529033e+000, -3.43136381e+000,
-2.73303739e+000, -2.73303739e+000, -1.07865078e-001,
-1.20813773e+000, -1.38914177e+000, -1.64693281e-001,
-3.13049392e+000, -1.82434376e+000, -1.79973731e+000,
-1.72356384e+000, 1.43946572e-002, -1.79079145e+000,
-2.47937146e+000, -1.77632140e+000, -9.19561257e-001,
-1.92401772e+000, -1.60567490e+000, -3.13049392e+000,
-1.76611749e+000, -1.05588792e+000])
Mac output
In [57]: _loess(y_prime, x_prime, span=0.3)
Out[57]:
array([ 1.49457748e-154, -2.00000000e+000, -2.73303739e+000,
-2.95456134e+000, -3.13049392e+000, -2.00491824e+000,
-3.13049392e+000, -3.43136381e+000, -7.27399489e-001,
-2.47937146e+000, -2.58722517e+000, -2.95456134e+000,
-8.56620272e-001, -2.82978696e+000, 8.30198094e-001,
-2.52939849e+000, -1.84931491e+000, -1.34903194e+000,
-8.66516453e-001, -5.96220168e-001, -1.62071137e+000,
-2.95456134e+000, -3.43136381e+000, -6.68135098e-001,
-3.43136381e+000, -1.59848311e+000, -1.60567490e+000,
-7.79845871e-001, -1.51529033e+000, -3.43136381e+000,
-2.73303739e+000, -2.73303739e+000, -1.07865078e-001,
-1.20813773e+000, -1.38914177e+000, -1.64693281e-001,
-3.13049392e+000, -1.82434376e+000, -1.79973731e+000,
-1.72356384e+000, 1.43946572e-002, -1.79079145e+000,
-2.47937146e+000, -1.77632140e+000, -9.19561257e-001,
-1.92401772e+000, -1.60567490e+000, -3.13049392e+000,
-1.76611749e+000, -1.05588792e+000])
Inputs:
In [58]: x_prime
Out[58]:
array([-2.47712125, -2.95424251, -2.73239376, -2.95424251, -3.13033377,
-2.17609126, -3.13033377, -3.43136376, -0.97197128, -2.47712125,
-2.58626572, -2.95424251, -1.10298416, -2.82930377, 0.07756976,
-2.52827378, -2.05115252, -1.38214574, -1.11121748, -0.85387196,
-1.63897207, -2.95424251, -3.43136376, -0.92081875, -3.43136376,
-1.61845041, -1.62518379, -1.0231238 , -1.53926916, -3.43136376,
-2.73239376, -2.73239376, -0.43310443, -1.25527251, -1.42276359,
-0.47136893, -3.13033377, -2.03342376, -1.98420573, -1.72379359,
-0.35472332, -1.96896577, -2.47712125, -1.85158017, -1.14132915,
-2.10914447, -1.62518379, -3.13033377, -1.76860593, -1.19331766])
In [59]: y_prime
Out[59]:
array([-2.47841044, -2.95456445, -2.73303787, -2.95456445, -3.13049471,
-2.17883537, -3.13049471, -3.43136376, -0.80049032, -2.47841044,
-2.58723226, -2.95456445, -0.83769899, -2.82978677, 0.84834422,
-2.52940161, -2.0548693 , -1.23432489, -0.79089853, -0.80478039,
-1.59502886, -2.95456445, -3.43136376, -0.89323276, -3.43136376,
-1.60231633, -1.60849252, -0.17071949, -1.54052111, -3.43136376,
-2.73303787, -2.73303787, -0.2696592 , -1.24462321, -1.42197869,
0.19327998, -3.13049471, -0.95878506, -1.98857216, -1.69851433,
-0.14432653, -1.94422696, -2.47841044, -1.83498874, -1.09092406,
-2.11237464, -1.60849252, -3.13049471, -1.73909396, -1.17613506])
Okay, thank you I will look at it. But from what you have posted I do not think it is a Mac vs Linux different output. Both of these first two values look suspicious i.e very low or exactly 2!
linux
2.08802498e-316, 2.85448522e-316,
mac
1.49457748e-154, -2.00000000e+000,
I guess if you rerun the inputs you will get different results, and if they the same the restarting the session should definitely yield different results!
I have fixed this and put out a new release. The issue was a memory bug, that showed up depending on the how the library was used. Without the fix, it could be avoided with yest = model.predict(x).values.copy()
Linux:
Mac:
I can give more architecture details but maybe there's something I'm missing? The input arrays are have about 16,000 samples.