Closed yurivict closed 4 years ago
Hello @yurivict I couldn't reproduce the issue on FreeBSD 11.2, can you share some more information about your setup, OS, clang, armadillo version. Also does ./ensmallen_tests -s -r console
show any intermediate results?
I added the port for ensmallen yesterday, so if you would just type cd /usr/ports/math/ensmallen && make test
it should reproduce the problem.
If you can run with CTEST_OUTPUT_ON_FAILURE=1 ctest
that would be really helpful too. ensmallen has some random tests. We try to keep the random failure probability really low, but we're not always successful, so if you can show us which test failed we can figure out if it's random or an actual problem.
It passes now when I added CTEST_OUTPUT_ON_FAILURE=1
I added this patch to the FreeBSD port to work around this issue:
--- src/libtfhe/fft_processors/nayuki/fft_processor_nayuki.cpp.orig 2019-10-11 03:07:51 UTC
+++ src/libtfhe/fft_processors/nayuki/fft_processor_nayuki.cpp
@@ -12,7 +12,7 @@ FFT_Processor_nayuki::FFT_Processor_nayuki(const int32
tables_reverse = fft_init_reverse(_2N);
omegaxminus1 = (cplx*) malloc(sizeof(cplx) * _2N);
for (int32_t x=0; x<_2N; x++) {
+ omegaxminus1[x]=std::complex<double>(cos(x*M_PI/N)-1., sin(x*M_PI/N));
//exp(i.x.pi/N)-1
}
}
Hm, is this the correct patch? This code isn't part of ensmallen, maybe I missed something?
Even with the "ten billion" issue fixed, I'm getting build failures on some 32-bit architectures. This is Debian version 2.10.4-2
(the -2
includes the ten billion patch).
https://buildd.debian.org/status/package.php?p=ensmallen
On armel it times out.
On i386
Test project /<<PKGBUILDDIR>>/obj-i686-linux-gnu
Start 1: ensmallen_tests
1/1 Test #1: ensmallen_tests ..................***Failed 3509.70 sec
ensmallen version: 2.10.4 (Fried Chicken)
armadillo version: 9.800.1 (Horizon Scraper)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ensmallen_tests is a Catch v2.4.1 host application.
Run with -? for options
-------------------------------------------------------------------------------
RosenbrockFunctionFloatTest
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:41
...............................................................................
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:54: FAILED:
REQUIRE( coords(1) == Approx(1.0).epsilon(1e-7) )
with expansion:
1.0f == Approx( 1.0 )
Parameter 4 to routine SLASCL was incorrect
Parameter 5 to routine SLASCL was incorrect
-------------------------------------------------------------------------------
Johnson844LovaszThetaFMatSDP
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:136
...............................................................................
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:161: FAILED:
REQUIRE( finalValue == Approx(-14.0).epsilon(0.1) )
with expansion:
nanf == Approx( -14.0 )
===============================================================================
test cases: 266 | 264 passed | 2 failed
assertions: 11281 | 11279 passed | 2 failed
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 3509.70 sec
The following tests FAILED:
1 - ensmallen_tests (Failed)
SLASCL is from LAPACK. ensmallen uses LAPACK indirectly via Armadillo. However, no function in Armadillo directly calls SLASCL. This suggests that another function in LAPACK (used by Armadillo) calls SLASCL, which in turn suggests the problem is with LAPACK.
About 6 months ago there was a very messy issue with Fortran "hidden" arguments (which affects a lot of software using LAPACK), but that was resolved as of Armadillo 9.500. For more details and further links see https://gitlab.com/conradsnicta/armadillo-code/issues/123
What version of gcc was used to compile LAPACK (or OpenBLAS) on Debian? Perhaps there's a miscompliation? Recent releases of gcc have a workaround for the Fortran "hidden" arguments snafu.
That sounds plausible, @conradsnicta. But Debian LAPACK was recompiled just two weeks ago: https://tracker.debian.org/pkg/lapack
Maybe they don't have the issue under as much control as they thought.
Another possibility is libarmadillo
, which actually has a bug filed against it for an architecture-specific header issue breaking mlpack
on hppa
: https://bugs.debian.org/912778
@barak To be clear - by libarmadillo do you mean all of armadillo, or only the armadillo run-time library? https://bugs.debian.org/912778 doesn't seem related to the armadillo run-time library (libarmadillo.so). The error comes from armadillo headers (99% of armadillo is in header files), but I suspect it's a compiler bug on hppa.
hppa appears to be a rather obscure and outdated architecture. If it doesn't get much testing/development, a compiler bug is entirely possible. (Why does Debian even bother with hppa?)
Regarding why Debian keep the hppa port going, well it's not an official Debian release architecture, it's just some hppa enthusiasts. (The official ones are the top nine listed on https://buildd.debian.org/status/package.php?p=ensmallen with white background; the remainder with gray background are just for fun.) So hppa won't hold up package migration into the stable distribution. But weird architectures are still useful for sniffing out portability issues and such.
On the present matter, when ensmallen
was compiled it got the libarmadillo
headers extant at the time, which were pretty recent.
The only architectures with testing failures for ensmallen
are 32-bit, which does seem suspicious.
Uploaded 2.11.1 to debian, and the autobuilders are still failing on a couple 32-bit architectures. Including two "release" architectures (i386 and armel) which will block migration of the package to release.
Details: https://buildd.debian.org/status/package.php?p=ensmallen click on the "Build-Attempted".
Digging through the logs here are my takes:
Test project /<<PKGBUILDDIR>>/obj-arm-linux-gnueabi
Start 1: ensmallen_tests
E: Build killed with signal TERM after 150 minutes of inactivity
It's unclear to me whether tests hung or just didn't finish yet because armel is for toasters. You could run ctest with various options to provide output during testing, and this might fix it.
/usr/bin/ctest --force-new-ctest-process -j4
Test project /<<PKGBUILDDIR>>/obj-i686-linux-gnu
Start 1: ensmallen_tests
1/1 Test #1: ensmallen_tests ..................***Failed 3185.66 sec
ensmallen version: 2.11.1 (The Poster Session Is Full)
armadillo version: 9.800.1 (Horizon Scraper)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ensmallen_tests is a Catch v2.4.1 host application.
Run with -? for options
-------------------------------------------------------------------------------
RosenbrockFunctionFloatTest
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:41
...............................................................................
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:54: FAILED:
REQUIRE( coords(1) == Approx(1.0).epsilon(1e-7) )
with expansion:
1.0f == Approx( 1.0 )
Parameter 4 to routine SLASCL was incorrect
Parameter 5 to routine SLASCL was incorrect
-------------------------------------------------------------------------------
Johnson844LovaszThetaFMatSDP
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:136
...............................................................................
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:161: FAILED:
REQUIRE( finalValue == Approx(-14.0).epsilon(0.1) )
with expansion:
nanf == Approx( -14.0 )
===============================================================================
test cases: 279 | 277 passed | 2 failed
assertions: 11309 | 11307 passed | 2 failed
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 3185.66 sec
The following tests FAILED:
1 - ensmallen_tests (Failed)
Errors while running CTest
make[1]: *** [Makefile:111: test] Error 8
make[1]: Leaving directory '/<<PKGBUILDDIR>>/obj-i686-linux-gnu'
dh_auto_test: cd obj-i686-linux-gnu && make -j4 test ARGS\+=-j4 returned exit code 2
make: *** [debian/rules:8: binary-arch] Error 255
dpkg-buildpackage: error: debian/rules binary-arch subprocess returned exit status 2
I see the issue with RosenbrockFunctionFloatTest
and will open a PR momentarily, but the Johnson844LovaszThetaFMatSDP
one might be trickier. I'm going to see if I can get an i386 machine or VM up and running to try and reproduce. The SLASCL
failures might actually not be a problem---we'd need to see if that was connected to the Johnson844LovaszThetaFMatSDP
test. It might not be, and it might not actually be causing a problem.
hppa: https://buildd.debian.org/status/fetch.php?pkg=ensmallen&arch=hppa&ver=2.11.1-1&stamp=1577964053&raw=0
I'm not going to copy paste these errors. I see some things like _ZN4arma8arma_rng5randnIdE4fillEPdj._omp_fn.0' referenced in section ...
... maybe disabling OpenMP here might be helpful?
hurd-i386: https://buildd.debian.org/status/fetch.php?pkg=ensmallen&arch=hurd-i386&ver=2.11.1-1&stamp=1577963591&raw=0 Looks like the same as i386.
I'd say not worth worrying about the non-release architectures like HPPA. At least, not until release architectures are working.
My suspicion is that the armel timeout issue is an actual test hang due to a bug related to the enormous spate of alignment warnings and such spewed out by the compiler.
Should I set CTEST_OUTPUT_ON_FAILURE=1
in the build scripts? Unless it's crazy voluminous, I can do that and maybe we'll find it easier to get a handle on things.
If you don't use ctest, you can get better output:
$ ./ensmallen_tests -d yes
ensmallen version: 2.10.4 (Fried Chicken)
armadillo version: 9.800.1 (Horizon Scraper)
4.609 s: SimpleAdaDeltaTestFunction
0.060 s: AdaDeltaLogisticRegressionTest
0.683 s: SimpleAdaDeltaTestFunctionFMat
0.068 s: AdaDeltaLogisticRegressionTestFMat
3.612 s: SimpleAdaGradTestFunction
2.491 s: AdaGradLogisticRegressionTest
3.651 s: SimpleAdaGradTestFunctionFMat
1.327 s: AdaGradLogisticRegressionTestFMat
0.000 s: AdamSphereFunctionTest
0.000 s: AdamSphereFunctionTestFMat
0.001 s: AdamSphereFunctionTestSpMat
0.000 s: AdamSphereFunctionTestSpMatDenseGradient
0.000 s: AdamStyblinskiTangFunctionTest
0.000 s: AdamMcCormickFunctionTest
0.000 s: AdamMatyasFunctionTest
0.000 s: AdamEasomFunctionTest
0.001 s: AdamBoothFunctionTest
0.427 s: SimpleAdamTestFunction
0.458 s: SimpleAdaMaxTestFunction
0.495 s: SimpleAMSGradTestFunction
0.055 s: AMSGradSphereFunctionTestFMat
0.562 s: AMSGradSphereFunctionTestSpMat
0.086 s: AMSGradSphereFunctionTestSpMatDenseGradient
0.063 s: AdamLogisticRegressionTest
11.835 s: AdaMaxLogisticRegressionTest
1.297 s: AMSGradLogisticRegressionTest
0.602 s: SimpleNadamTestFunction
0.061 s: NadamLogisticRegressionTest
0.386 s: SimpleNadaMaxTestFunction
The order is deterministic, so if it hangs, we can track down pretty easily which one it is.
Okay, just uploaded something with that in debian/rules.
Okay, there's an i386 build failure with more info on https://buildd.debian.org/status/package.php?p=ensmallen, enjoy!
...
8.302 s: KatyushaProximalLogisticRegressionSpMatTest
0.000 s: RosenbrockFunctionTest
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ensmallen_tests is a Catch v2.4.1 host application.
Run with -? for options
-------------------------------------------------------------------------------
RosenbrockFunctionFloatTest
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:41
...............................................................................
/<<PKGBUILDDIR>>/tests/lbfgs_test.cpp:54: FAILED:
REQUIRE( coords(1) == Approx(1.0).epsilon(1e-7) )
with expansion:
1.0f == Approx( 1.0 )
0.000 s: RosenbrockFunctionFloatTest
0.000 s: RosenbrockFunctionSpGradTest
0.000 s: RosenbrockFunctionSpMatTest
...
0.928 s: LookaheadAdamSimpleSGDTestFunctionFloat
2.084 s: Johnson844LovaszThetaSDP
Parameter 4 to routine SLASCL was incorrect
Parameter 5 to routine SLASCL was incorrect
-------------------------------------------------------------------------------
Johnson844LovaszThetaFMatSDP
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:136
...............................................................................
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:161: FAILED:
REQUIRE( finalValue == Approx(-14.0).epsilon(0.1) )
with expansion:
nanf == Approx( -14.0 )
0.000 s: Johnson844LovaszThetaFMatSDP
3.405 s: ErdosRenyiRandomGraphMaxCutSDP
...
0.017 s: WNGradStyblinskiTangFunctionSpMatTest
===============================================================================
test cases: 279 | 277 passed | 2 failed
assertions: 11309 | 11307 passed | 2 failed
I can't seem to reproduce the Johnson844LovaszThetaFMatSDP
issue on an i386/debian:unstable
Docker container---has anyone else been able to reproduce that in an easy-to-work with environment? I wonder if it has to do with versions of dependencies or something, but I would expect that I was doing the same thing as the Debian build.
If we can reproduce it then we can open an easy issue to work on. Actually, I guess, we could open issue even if we can't reproduce it, but if we can reproduce it, we significantly increase the probability that it will be solved. :)
The autobuilder blew its cookies in exactly the same way on both linux-i386 and hurd-i386, for whatever that's worth.
Is there a way that I can reproduce the autobuilder environment exactly for either of those cases?
There's /supposed/ to be sufficient information in the logs to know the precise version (in the debian sense, so that means a unique binary package) of everything, and an exhaustive list of all installed packages. So in theory, you should be able to go into a chroot environment and install exactly the same stuff.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:
Just to put it out there, this issue is blocking progression of the package in Debian towards the release, which is in turn blocking the mlpack library's progression. Open to any advice: I could disable testing, I could restrict the architecture to a whitelist, I could blacklist some problematic architectures. But the fact that plain old i386 is having problems makes me loath to do any of that, at least without some better justification than "um, it has a mysterious heisen-build-failure".
I'd love to fix it but I don't have enough information to reproduce---I don't have time to read through the Debian docs to figure out exactly how to reproduce the environment (I'd love to, really! There is just too much other stuff). And I can't reproduce the issue otherwise on i386 containers, even with different random seeds.
Honestly? I'd consider just disabling that particular test with ensmallen_tests ~Johnson844LovaszThetaFMatTest
(I think that's right).
If you can get me a quick and easy way to reproduce the failure in an interactive environment so I can debug it, I can definitely get more information. :+1:
Tests currently pass on FreeBSD fine.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:
Okay I'll try to upload see what happens!
Gah! Uploaded 2.11.5, see https://buildd.debian.org/status/package.php?p=ensmallen
...
0.000 s: LookaheadAdamSphereFunctionTest
0.434 s: LookaheadAdamSimpleSGDTestFunction
0.000 s: LookaheadAdaGradSphereFunction
0.129 s: LookaheadAdamLogisticRegressionTest
0.712 s: LookaheadAdamSimpleSGDTestFunctionFloat
1.598 s: Johnson844LovaszThetaSDP
Parameter 4 to routine SLASCL was incorrect
Parameter 5 to routine SLASCL was incorrect
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ensmallen_tests is a Catch v2.4.1 host application.
Run with -? for options
-------------------------------------------------------------------------------
Johnson844LovaszThetaFMatSDP
-------------------------------------------------------------------------------
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:136
...............................................................................
/<<PKGBUILDDIR>>/tests/lrsdp_test.cpp:161: FAILED:
REQUIRE( finalValue == Approx(-14.0).epsilon(0.1) )
with expansion:
nanf == Approx( -14.0 )
0.000 s: Johnson844LovaszThetaFMatSDP
2.692 s: ErdosRenyiRandomGraphMaxCutSDP
15.001 s: GaussianMatrixSensingSDP
4.566 s: MomentumSGDSpeedUpTestFunction
1.016 s: MomentumSGDGeneralizedRosenbrockTest
...
0.012 s: WNGradStyblinskiTangFunctionSpMatTest
===============================================================================
test cases: 283 | 282 passed | 1 failed
assertions: 11139 | 11138 passed | 1 failed
make[1]: *** [debian/rules:24: override_dh_auto_test] Error 1
...
So, I get that this test is failing and I would be happy to work on the issue, except I can't reproduce it. My suggestion would be to not hold up the package on it, and either disable the i386 architecture (probably overkill), or simply disable that failing test. I have a strong feeling it's not indicative of anything that's actually wrong. What do you think?
I'm going to set it to still run the tests, but to not abort the build if they fail on i386. That way we can still see any test failures, but we'll get a built package and can try to diagnose the issue more directly, like using the package as built in a chroot and such.
:+1: that sounds good to me. I'm sorry that I can't be more helpful on this (I hate not being able to solve an issue). And yeah, if for instance I could use the package in a chroot or something, perhaps I could be able to reproduce the failure. At least it is worth a shot. :)
https://buildd.debian.org/status/package.php?p=ensmallen shows the hurd-i386 build succeeding, the log shows a failed test, so soon we'll have a linux-i386 build and can test it on any chroot debian i386. If you have a debian box, I can walk you through setting up an appropriate chroot - it's pretty much just a matter of installing the right packages, one command to create the chroot, and another to jump into it. Would probably also work on Ubuntu.
Yeah, I'm a Debian user, so if you have a handful of commands I can try, I'd be happy to give it a shot. :+1:
sudo apt install cowbuilder git-buildpackage
env ARCH=i386 git-pbuilder create
now you're set up to
env ARCH=i386 git pbuilder login
might want to periodically update, if you want the latest compiler etc
env ARCH=i386 git pbuilder update
For more details, and a laundry list of ways to speed up particular usages, set up caching, etc, see https://wiki.debian.org/git-pbuilder
Okay, I hot-wired the build scripts to ignore test errors on i386. Well now they're popping up on armel, armhf, and mipsel as well. This is for 2.12.1, see https://buildd.debian.org/status/package.php?p=ensmallen
Hi @barak, thank you so much for those directions. I know it took a long time... but I finally got around to using them! And I was able to (after some fighting) reproduce the issue, and then debug it. I opened #217, and once that is merged, I believe that this issue will be fixed and you can remove the bit of code in packaging that ignores the failed tests. :)
Do you have an easy way to test with the changes from #217 on the various Debian architectures? If you can do that, then we can be sure that the issue is solved before we merge. :+1:
Thanks for the patience! The gears may turn slowly... but they do still turn... :)
It's possible to test on architectures w/o uploading a new version, but it's a hassle. I find it easier to just toss it up on the wall and see if it sticks.
No worries---I already tested using the environment created by pbuilder, so I think at least the i386 issue should be fixed. It may or may not fix everything else; we'll see. If there are more failures, I'll just address them in follow-up PRs. :+1:
I removed the ignore-testing-error stuff and it seems to build okay everywhere!
Awesome! I think we can go ahead and close this issue then. If you have further issues, please, feel free to open another issue. :)
The thought processes are not moving as fast as the fingers. I hit the buttom where "close and comment" usually is, without noticing that the issue was already closed. So now I'll re-close it. :smile:
Yeah, that's why cars have a single pedal which controls the gas or the brakes on alternate depressions. Also weapons systems have a half red / half green button labeled "initiate / cancel global thermonuclear strike (ðŸ™/🙯)".
OS: FreeBSD