Closed hegjon closed 2 years ago
hash-slow-2 now passes.
Excellent, those patches are now in https://github.com/monero-project/monero/pull/4781 xiphon, if you prefer PRing yours separately, let me know and I'll amend. Can you share the new core tests log now please ?
xiphon, if you prefer PRing yours separately, let me know and I'll amend.
Ah, i don't care. You included the fix into #4781 and specified me as a co-author, so it i'm absolutely fine with how #4781 is done.
Just curious about this define https://github.com/monero-project/monero/pull/4781#pullrequestreview-171079489. i guess we don't have to change it. Would be better to test the code with previous define version. @nioroso-x3 , could you?
It was a debug thing I left in, fixed.Thanks for spotting it.
I fixed the define mentioned by xiphon, slow-hash tests continue to pass with no problems. Core tests always gets stuck after the gen_block_is_too_big test, CPU load is 0%, so I just killed it.
cncrypto-tests.log.gz core_tests.log.gz unit_tests.log.gz
Running tests... Test project /home/jribeiro/Development/monero/build/Linux/master/release Start 1: hash-target 1/15 Test #1: hash-target ...................... Passed 0.27 sec Start 2: core_tests 2/15 Test #2: core_tests .......................Exception: Other5436.55 sec Start 3: cncrypto 3/15 Test #3: cncrypto ......................... Passed 66.04 sec Start 4: unit_tests 4/15 Test #4: unit_tests .......................Failed 789.18 sec Start 5: difficulty 5/15 Test #5: difficulty ....................... Passed 0.07 sec Start 6: hash-fast 6/15 Test #6: hash-fast ........................ Passed 0.06 sec Start 7: hash-slow 7/15 Test #7: hash-slow ........................ Passed 1.42 sec Start 8: hash-slow-1 8/15 Test #8: hash-slow-1 ...................... Passed 1.83 sec Start 9: hash-slow-2 9/15 Test #9: hash-slow-2 ...................... Passed 5.49 sec Start 10: hash-tree 10/15 Test #10: hash-tree ........................ Passed 0.02 sec Start 11: hash-extra-blake 11/15 Test #11: hash-extra-blake ................. Passed 0.04 sec Start 12: hash-extra-groestl 12/15 Test #12: hash-extra-groestl ............... Passed 0.05 sec Start 13: hash-extra-jh 13/15 Test #13: hash-extra-jh .................... Passed 0.04 sec Start 14: hash-extra-skein 14/15 Test #14: hash-extra-skein ................. Passed 0.04 sec Start 15: hash-variant2-int-sqrt 15/15 Test #15: hash-variant2-int-sqrt ........... Passed 1350.47 sec
87% tests passed, 2 tests failed out of 15
Total Test time (real) = 7651.73 sec
The following tests FAILED: 2 - core_tests (OTHER_FAULT) 4 - unit_tests (Failed)
The test afer is_too_big is a really slow one. It'll take some time, leave it on :)
All tests so far before this one passed, so it's encouraging.
Actually, I see you've run for like an hour and a half, that might be a bit much. Can you get an all thread stack trace after it's been stuck for a wihle ?
gdb build/release/core_tests/core_tests `pidof core_tests` thread apply all bt
(s/release/debug/ if you built a debug build, bettter debug info)
I get this, compiled with debug-test this time
Attaching to program: /home/jribeiro/Development/monero/build/Linux/master/debug/tests/core_tests/core_tests, process 6500 [New LWP 6502] [New LWP 6503] [New LWP 6504] 0x00003fffa44eba14 in ?? () (gdb) thread apply all bt
Thread 4 (LWP 6504):
Thread 3 (LWP 6503):
Thread 2 (LWP 6502):
Thread 1 (LWP 6500):
After detaching gdb cores_tests segfaulted.
Something is off, even release should have better trace... Did gdb complain about anything when loading ?
Nope, it loaded all symbols.
Alright, please try with that particular test (invalid_binary_format) disabled by commenting it out in tests/core_tests/chaingen_main.cpp.
Ok, now it crashed after the "gen_bp_tx_invalid_borromean_type" test. Finally the log increased quite a lot.
Nice, that's all of them except the one you commented :) The crash at the end is fixed in #4785, unrelated to endianness. For the remaining (invalid_binary_format), are you able to run with valgrind or ASAN ? With valgrind, you just prepend "valgrind " to your normal command line. With ASAN, you build monero with -D SANITIZE=ON on the cmake command line. ASAN is best if you can (much faster, detects more problems), but might not be available on your particular arch.
BTW, if you want to run just one test, you can use --filter=regexp So here, --filter=\*invalid_binary_format\*
Valgrind seems to complain a lot about invalid writes and reads in slow-hash. I ran this using the filter, so only the invalid_binary_format test is running.
I also ran it without the filter, it also complains about the same lines.
core_tests_full_valgrind.log.gz
I can make an account on my powermac for a dev, it has 4 cores, 8gb of ram and a SSD, should be a lot faster than running a qemu vm.
Try adding "--max-stackframe=4000000" to the valgrind command line. the Cryptonight stacks need to be large.
Ok, core_tests crashed with segfault inside valgrind now, but much earlier.
Looks like some compiler or lib problem. Try adding "-D STACK_TRACE=OFF" to the cmake command line.
Ubuntu 16, PowerPC BE, 32 bit. PRs: (#4796, #4726, #4689, #4781, #4757, #4755).
core_tests took too long so I bailed on that.
unit_tests failed. Looks like something to do with -fPIC:
unit_tests: error while loading shared libraries: R_PPC_REL24 relocation at 0x010f521c for symbol '_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc' out of range
All other tests passed 👍
Going to run the core_tests again and leave running to see what that reports.
I made 2 builds, one with D STACK_TRACE=OFF and a -D SANITIZE=ON build They are both running the inv format test, extremely slowly. core_tests_valgrind.log.gz core_tests_asan.log.gz
Ubuntu 16, PowerPC BE, 32 bit. PRs: (#4796, #4726, #4689, #4781, #4757, #4755).
core_tests took too long so I bailed on that.
unit_tests failed. Looks like something to do with -fPIC:
unit_tests: error while loading shared libraries: R_PPC_REL24 relocation at 0x010f521c for symbol '_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc' out of range
All other tests passed
Going to run the core_tests again and leave running to see what that reports.
Are you on a G5 too? I remember ubuntu 16.04 being quite buggy, thats why I switched to gentoo.
@nioroso-x3 No. Annoyingly I got rid of my G5 last year (it was gathering serious dust!). Thus my tests have been using qemu-system-ppc. It's been pretty stable with Ubuntu on it, although of course, very slow.
@nioroso-x3 No. Annoyingly I got rid of my G5 last year (it was gathering serious dust!). Thus my tests have been using qemu-system-ppc. It's been pretty stable with Ubuntu on it, although of course, very slow.
Can you paste your public SSH key? I can try to get access to a PPC machine for you on the Fedora infrastructure.
Here is the failing core_tests log (Ubuntu 16 PowerPC BE 32bit). LastTest.log.tar.gz
core_tests also gets stuck in Fedora 25 ppc64. That also uses gcc 6.4, I'll test the newest gcc just in case
Core tests passes completely when using llvm3.9 on fedora 25 and llvm (clang) 7.0 in gentoo, looks like gcc is buggy for ppc64 lol
First log is for gentoo in release, second for fedora in debug, looks like at the end there is a double free error, but everything passes for core_tests.
core_tests_llvm7_release.log.gz
And now... does it sync the blockchain ? :)
Nope, its not syncing. bitmonero_gentoo.log.gz
New bitmonero log after 4866 What does that patch fix?
It fixes values read/written from/to the network differently on little endian and big endian archs.
And I see at least another one that needs fixing.
I updated 4866,
New log bitmonero.tar.gz
I found more places that need endian fixing. I'll post when I've fixed all I see.
4866 updated again.
New log, also unit_tests is getting stuck after mnemonics test, core_tests passes.
We can receive packet :) Looks like the payload is also endian dependent though. Not fun.
Will this bug be fixed? I'm willing to provide ssh access to a machine for testing.
I can debug as a background task if I have access to such a machine.
Post a ssh public key, I can give you access to my G5 with gentoo. It has clang-8 and gcc-8.2. You'll have access at monerodevs@nerv-la.ddns.net:223
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDEGd0x3Tkn/Ht1gKZlQY2T0oEpPEenGGPqzPMHMvHJ8S/PLbkVAFfNLDuBdshnm3r/4eMYspBO8/Pa55ICrURwhLk/aQ5vuNwvoReSib5omItheNM5ALWZpVfNTBZct1raryBIaDOUn9SvfLhZzhKojRSrFF4P5Nitn4aMjcGiKklIdFluQ0cIOmA4yY2DY8x6NPECVtPsJrwc89CMlPtlXNd8TgAWy8PvEQb7H9T6XaW4Mn1fGwT52+70q/Eyo4iNrGuLx74obvtAd3nCugTJykE1dXIiQQ3FtmtPqZCOQfaAVteKWvUPWYs4yc+b7LCqf06YvFhw+FfkS04F0gV user@host
You should have access now.
PPC64le (little endian) is failing some tests: Test project /monero/build/Linux/master/debug Start 1: hash-target 1/19 Test #1: hash-target ...................... Passed 2.23 sec Start 2: core_tests 2/19 Test #2: core_tests .......................Failed 12970.89 sec Start 3: cncrypto 3/19 Test #3: cncrypto ......................... Passed 19.66 sec Start 4: cnv4-jit 4/19 Test #4: cnv4-jit ......................... Passed 1210.97 sec Start 5: unit_tests 5/19 Test #5: unit_tests .......................Failed 896.62 sec Start 6: difficulty 6/19 Test #6: difficulty ....................... Passed 0.09 sec Start 7: wide_difficulty 7/19 Test #7: wide_difficulty ..................***Failed 0.03 sec Start 8: block_weight 8/19 Test #8: block_weight ..................... Passed 111.12 sec Start 9: hash-fast 9/19 Test #9: hash-fast ........................ Passed 0.06 sec Start 10: hash-slow 10/19 Test #10: hash-slow ........................ Passed 0.62 sec Start 11: hash-slow-1 11/19 Test #11: hash-slow-1 ...................... Passed 0.69 sec Start 12: hash-slow-2 12/19 Test #12: hash-slow-2 ...................... Passed 1.71 sec Start 13: hash-slow-4 13/19 Test #13: hash-slow-4 ...................... Passed 5.99 sec Start 14: hash-tree 14/19 Test #14: hash-tree ........................ Passed 0.02 sec Start 15: hash-extra-blake 15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec Start 16: hash-extra-groestl 16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec Start 17: hash-extra-jh 17/19 Test #17: hash-extra-jh .................... Passed 0.03 sec Start 18: hash-extra-skein 18/19 Test #18: hash-extra-skein ................. Passed 0.02 sec Start 19: hash-variant2-int-sqrt 19/19 Test #19: hash-variant2-int-sqrt ........... Passed 473.87 sec
I couldnt find the .log for the wide-difficulty test, what is the filename? core_and_unit_tests.zip
hash-slow-2 and hash-slow-4 are failing in big endian ppc64
Test project /home/jribeiro/Development/monero-ori/build/Linux/master/debug
Start 1: hash-target
1/19 Test #1: hash-target ...................... Passed 2.34 sec
Start 2: core_tests
2/19 Test #2: core_tests .......................Failed 686.95 sec
Start 3: cncrypto
3/19 Test #3: cncrypto ......................... Passed 41.94 sec
Start 4: cnv4-jit
4/19 Test #4: cnv4-jit ......................... Passed 2062.62 sec
Start 5: unit_tests
5/19 Test #5: unit_tests .......................Failed 609.90 sec
Start 6: difficulty
6/19 Test #6: difficulty ....................... Passed 0.25 sec
Start 7: wide_difficulty
7/19 Test #7: wide_difficulty .................. Passed 38.04 sec
Start 8: block_weight
8/19 Test #8: block_weight ..................... Passed 184.81 sec
Start 9: hash-fast
9/19 Test #9: hash-fast ........................ Passed 0.23 sec
Start 10: hash-slow
10/19 Test #10: hash-slow ........................ Passed 1.37 sec
Start 11: hash-slow-1
11/19 Test #11: hash-slow-1 ...................... Passed 1.80 sec
Start 12: hash-slow-2
12/19 Test #12: hash-slow-2 ......................Failed 6.17 sec
Start 13: hash-slow-4
13/19 Test #13: hash-slow-4 ......................Failed 10.52 sec
Start 14: hash-tree
14/19 Test #14: hash-tree ........................ Passed 0.20 sec
Start 15: hash-extra-blake
15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec
Start 16: hash-extra-groestl
16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec
Start 17: hash-extra-jh
17/19 Test #17: hash-extra-jh .................... Passed 0.04 sec
Start 18: hash-extra-skein
18/19 Test #18: hash-extra-skein ................. Passed 0.04 sec
Start 19: hash-variant2-int-sqrt
19/19 Test #19: hash-variant2-int-sqrt ........... Passed 1222.28 sec
It should all be in LastTest.log
Thanks much for the G5 access. The patch above fixes most issues. There's still a failure in serialization unit tests, which I think is due to using boost code that's not endianness nice (not 100% sure). I think all the rest is fixed (but it takes massive amounts of time to build/test on that G5 so I've not run a full test run).
The serialization test failure is now also fixed, same PR.
Seems like this is the cause:
Full log: https://kojipkgs.fedoraproject.org//work/tasks/3387/27023387/build.log