shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

FP tests fail on non-x86 architectures on Fedora 22 (AArch64, PPC64LE) #2844

Closed hrw closed 4 years ago

hrw commented 9 years ago

Hi

Shogun 4.0.0 is part of Fedora 22 packages collection. We do build for AArch64 architecture. Compiles fine, pass most of tests but some fail:

The following tests FAILED: 83 - integration-python_modular-SVMOcas_30_1en05_16_05_False (Failed) 89 - integration-python_modular-SVMOcas_30_1en05_1_05_False (Failed) 105 - integration-python_modular-tester-kernel_poly_modular (Failed) 120 - integration-python_modular-tester-kernel_sigmoid_modular (Failed) 143 - integration-python_modular-tester-classifier_libsvm_modular (Failed) 161 - integration-python_modular-tester-regression_gaussian_process_modular (Failed) 168 - integration-python_modular-tester-classifier_averaged_perceptron_modular (Failed) 184 - integration-python_modular-tester-preprocessor_normone_modular (Failed) 192 - integration-python_modular-tester-kernel_sparse_poly_modular (Failed) 198 - integration-python_modular-tester-distance_sparseeuclidean_modular (Failed) 207 - integration-python_modular-tester-distance_jensen_modular (Failed) 228 - integration-python_modular-tester-classifier_svmocas_modular (Failed) 236 - integration-python_modular-tester-kernel_salzberg_word_string_modular (Failed) 240 - integration-python_modular-tester-classifier_multiclasslibsvm_modular (Failed) 254 - integration-python_modular-tester-kernel_wave_modular (Failed) 257 - integration-python_modular-tester-distance_geodesic_modular (Failed) 260 - integration-python_modular-tester-preprocessor_prunevarsubmean_modular (Failed) 308 - integration-python_modular-tester-distance_cosine_modular (Failed) 312 - integration-python_modular-tester-library_fisher2x3_modular (Failed) 315 - integration-python_modular-tester-regression_libsvr_modular (Failed) 439 - unit-BAHSIC (Failed) 453 - unit-CustomKernelTest (Failed) 458 - unit-Statistics (Failed)

Build is available online at http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2991856 where you can check whole build log and also check which packages were installed to build it (in root.log).

From what I see the problem is with floating point operations as results differ at last two digits from time to time.

besser82 commented 9 years ago

@hrm: Hi there! Sorry for the late reply… I'm the package-owner in Fedora… Did you have a look at the spec-file I use, yet? There are lot's of exceptions for failed tests for e.g. %ix86 and %arm, just because of this problem, currently…

I'll add the failing test to the excludes for aarch64 to the spec-file and ping pbrobinson to have him rebuild it.

Possibly a rhbz# would be good to track this one…

Cheers Björn

hrw commented 9 years ago

I had that feeling that I forgot something...

Bug reported: https://bugzilla.redhat.com/show_bug.cgi?id=1222401

hrw commented 9 years ago

Besser82: the solution is not to skip failed tests. The solution should be fixing of those tests to make them work.

I hope to make a build on Power8 machine (where FPU follows IEEE754-2008) to check what kind of results it will give. AArch64 uses IEEE754 as well. x86 should use it too but it is x86 - architecture with such pile of mess that anything can happen there.

besser82 commented 9 years ago

Should be fixed in packaging now. See RHBZ #1222401.

hrw commented 9 years ago

Besser82: this bug was not about Fedora packaging but about code. Please reopen.

hrw commented 9 years ago

Similar failures on PPC64LE:

The following tests FAILED: 1 - integration-python_modular-SVMOcas_30_1en05_16_05_False (Timeout) 3 - integration-python_modular-SVMOcas_30_1en05_1_05_False (Timeout) 15 - integration-python_modular-SVMOcas_15_1en05_1_05_False (Timeout) 16 - integration-python_modular-LibSVR_023_1_15_Gaussian_1en05_001 (Timeout) 17 - integration-python_modular-LibSVR_30_1_15_Gaussian_00001_0001 (Timeout) 18 - integration-python_modular-LibSVR_30_1_15_Gaussian_1en05_001 (Timeout) 114 - integration-python_modular-tester-kernel_poly_modular (Failed) 119 - integration-python_modular-tester-distance_sparseeuclidean_modular (Failed) 130 - integration-python_modular-tester-distance_cosine_modular (Failed) 138 - integration-python_modular-tester-distance_geodesic_modular (Failed) 147 - integration-python_modular-tester-classifier_multiclasslibsvm_modular (Failed) 168 - integration-python_modular-tester-converter_locallylinearembedding_modular (Failed) 173 - integration-python_modular-tester-kernel_sparse_poly_modular (Failed) 180 - integration-python_modular-tester-regression_gaussian_process_modular (Failed) 188 - integration-python_modular-tester-preprocessor_prunevarsubmean_modular (Failed) 193 - integration-python_modular-tester-kernel_sigmoid_modular (Failed) 195 - integration-python_modular-tester-converter_laplacianeigenmaps_modular (Failed) 216 - integration-python_modular-tester-kernel_salzberg_word_string_modular (Failed) 220 - integration-python_modular-tester-distance_jensen_modular (Failed) 221 - integration-python_modular-tester-converter_localtangentspacealignment_modular (Failed) 225 - integration-python_modular-tester-preprocessor_dimensionreductionpreprocessor_modular (Failed) 229 - integration-python_modular-tester-converter_hessianlocallylinearembedding_modular (Failed) 259 - integration-python_modular-tester-preprocessor_normone_modular (Failed) 277 - integration-python_modular-tester-classifier_svmocas_modular (Failed) 297 - integration-python_modular-tester-classifier_libsvm_modular (Failed) 298 - integration-python_modular-tester-converter_kernellocallylinearembedding_modular (Failed) 308 - integration-python_modular-tester-regression_libsvr_modular (Failed) 311 - integration-python_modular-tester-kernel_wave_modular (Failed) 325 - unit-DirectSparseLinearSolver (SEGFAULT) 331 - unit-LogDetEstimator (SEGFAULT) 352 - unit-Statistics (SEGFAULT) 457 - unit-BAHSIC (Failed) 574 - libshogun-converter_localtangentspacealignment (SEGFAULT) 626 - libshogun-converter_hessianlocallylinearembedding (SEGFAULT) 631 - libshogun-converter_kernellocallylinearembedding (SEGFAULT) 643 - libshogun-converter_laplacianeigenmaps (SEGFAULT) 652 - libshogun-converter_locallylinearembedding (SEGFAULT) 728 - python_modular-mathematics_logdet (OTHER_FAULT) 744 - python_modular-converter_locallylinearembedding_modular (SEGFAULT) 771 - python_modular-converter_laplacianeigenmaps_modular (SEGFAULT) 797 - python_modular-converter_localtangentspacealignment_modular (SEGFAULT) 801 - python_modular-preprocessor_dimensionreductionpreprocessor_modular (SEGFAULT) 805 - python_modular-converter_hessianlocallylinearembedding_modular (SEGFAULT) 873 - python_modular-converter_kernellocallylinearembedding_modular (SEGFAULT) 1025 - ruby-converter_hessianlocallylinearembedding_modular (OTHER_FAULT) Errors while running CTest

Build with disabled R support.

vigsterkr commented 8 years ago

@hrw hi! any ideas how i could quickly debug it? shall i use qemu or do you have maybe better suggestion?

hrw commented 8 years ago

qemu is one way. Getting access to AArch64 or Power machine is other - but I do not know any publically accessible ways now.

hrw commented 8 years ago

https://www.linaro.cloud/ is one way. But I do not know how long you would wait.

vigsterkr commented 8 years ago

@hrw i had once access to linaro, but it took a long time to get access :) so qemu it is :P

vigsterkr commented 8 years ago

@hrw ok so finally i've managed to put together a qemu env for this ! \o/ so yeah i can confirm this.... it puzzles me a bit why those specific errors, but looking into it :)

vigsterkr commented 8 years ago

@hrw btw what are the cmake/compiler flags you are using to compile shogun for aarch64 as i've just seen that Release had -mfpmath=sse which is a no-go for anything none x86 :P

hrw commented 8 years ago

http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=362126 has build logs.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hrw commented 4 years ago

@vigsterkr nowadays you can use Travis CI to have tests built on !x86.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue is now being closed due to a lack of activity. Feel free to reopen it.