monero-project / monero

Monero: the secure, private, untraceable cryptocurrency
https://getmonero.org
Other
9k stars 3.12k forks source link

Fail to build on PPC64 #3826

Closed hegjon closed 2 years ago

hegjon commented 6 years ago

Seems like this is the cause:

cc1: error: unrecognized command line option '-maes'
cc1: error: unrecognized command line option '-march=native'

Full log: https://kojipkgs.fedoraproject.org//work/tasks/3387/27023387/build.log

nioroso-x3 commented 6 years ago

hash-slow-2 now passes.

moneromooo-monero commented 6 years ago

Excellent, those patches are now in https://github.com/monero-project/monero/pull/4781 xiphon, if you prefer PRing yours separately, let me know and I'll amend. Can you share the new core tests log now please ?

xiphon commented 6 years ago

xiphon, if you prefer PRing yours separately, let me know and I'll amend.

Ah, i don't care. You included the fix into #4781 and specified me as a co-author, so it i'm absolutely fine with how #4781 is done.

Just curious about this define https://github.com/monero-project/monero/pull/4781#pullrequestreview-171079489. i guess we don't have to change it. Would be better to test the code with previous define version. @nioroso-x3 , could you?

moneromooo-monero commented 6 years ago

It was a debug thing I left in, fixed.Thanks for spotting it.

nioroso-x3 commented 6 years ago

I fixed the define mentioned by xiphon, slow-hash tests continue to pass with no problems. Core tests always gets stuck after the gen_block_is_too_big test, CPU load is 0%, so I just killed it.

cncrypto-tests.log.gz core_tests.log.gz unit_tests.log.gz

Running tests... Test project /home/jribeiro/Development/monero/build/Linux/master/release Start 1: hash-target 1/15 Test #1: hash-target ...................... Passed 0.27 sec Start 2: core_tests 2/15 Test #2: core_tests .......................Exception: Other5436.55 sec Start 3: cncrypto 3/15 Test #3: cncrypto ......................... Passed 66.04 sec Start 4: unit_tests 4/15 Test #4: unit_tests .......................Failed 789.18 sec Start 5: difficulty 5/15 Test #5: difficulty ....................... Passed 0.07 sec Start 6: hash-fast 6/15 Test #6: hash-fast ........................ Passed 0.06 sec Start 7: hash-slow 7/15 Test #7: hash-slow ........................ Passed 1.42 sec Start 8: hash-slow-1 8/15 Test #8: hash-slow-1 ...................... Passed 1.83 sec Start 9: hash-slow-2 9/15 Test #9: hash-slow-2 ...................... Passed 5.49 sec Start 10: hash-tree 10/15 Test #10: hash-tree ........................ Passed 0.02 sec Start 11: hash-extra-blake 11/15 Test #11: hash-extra-blake ................. Passed 0.04 sec Start 12: hash-extra-groestl 12/15 Test #12: hash-extra-groestl ............... Passed 0.05 sec Start 13: hash-extra-jh 13/15 Test #13: hash-extra-jh .................... Passed 0.04 sec Start 14: hash-extra-skein 14/15 Test #14: hash-extra-skein ................. Passed 0.04 sec Start 15: hash-variant2-int-sqrt 15/15 Test #15: hash-variant2-int-sqrt ........... Passed 1350.47 sec

87% tests passed, 2 tests failed out of 15

Total Test time (real) = 7651.73 sec

The following tests FAILED: 2 - core_tests (OTHER_FAULT) 4 - unit_tests (Failed)

moneromooo-monero commented 6 years ago

The test afer is_too_big is a really slow one. It'll take some time, leave it on :)

All tests so far before this one passed, so it's encouraging.

moneromooo-monero commented 6 years ago

Actually, I see you've run for like an hour and a half, that might be a bit much. Can you get an all thread stack trace after it's been stuck for a wihle ?

gdb build/release/core_tests/core_tests `pidof core_tests` thread apply all bt

(s/release/debug/ if you built a debug build, bettter debug info)

nioroso-x3 commented 6 years ago

I get this, compiled with debug-test this time

Attaching to program: /home/jribeiro/Development/monero/build/Linux/master/debug/tests/core_tests/core_tests, process 6500 [New LWP 6502] [New LWP 6503] [New LWP 6504] 0x00003fffa44eba14 in ?? () (gdb) thread apply all bt

Thread 4 (LWP 6504):

0 0x00003fffa399165c in ?? ()

1 0x00003fffa399163c in ?? ()

2 0x00003fffa48245c0 in ?? ()

3 0x00003fffa4827c88 in ?? ()

4 0x00003fffa3d759e0 in ?? ()

5 0x00003fffa398820c in ?? ()

6 0x00003fffa38c6e30 in ?? ()

Thread 3 (LWP 6503):

0 0x00003fffa399165c in ?? ()

1 0x00003fffa399163c in ?? ()

2 0x00003fffa48245c0 in ?? ()

3 0x00003fffa4827c88 in ?? ()

4 0x00003fffa3d759e0 in ?? ()

5 0x00003fffa398820c in ?? ()

6 0x00003fffa38c6e30 in ?? ()

Thread 2 (LWP 6502):

0 0x00003fffa399165c in ?? ()

1 0x00003fffa399163c in ?? ()

2 0x00003fffa48245c0 in ?? ()

3 0x00003fffa4827c88 in ?? ()

4 0x00003fffa3d759e0 in ?? ()

5 0x00003fffa398820c in ?? ()

6 0x00003fffa38c6e30 in ?? ()

Thread 1 (LWP 6500):

0 0x00003fffa44eba14 in ?? ()

1 0x00003fffa44ec59c in ?? ()

2 0x00003fffa4b16114 in ?? ()

3 0x00003fffa4b34a4c in ?? ()

4 0x000000010088d024 in ?? ()

5 0x0000000100891af0 in ?? ()

6 0x00000001008733c0 in ?? ()

7 0x0000000100875a78 in ?? ()

8 0x00000001008abac8 in ?? ()

9 0x00003fffa37cd188 in ?? ()

10 0x00003fffa37cd3b0 in ?? ()

11 0x0000000000000000 in ?? ()

nioroso-x3 commented 6 years ago

After detaching gdb cores_tests segfaulted.

moneromooo-monero commented 6 years ago

Something is off, even release should have better trace... Did gdb complain about anything when loading ?

nioroso-x3 commented 6 years ago

Nope, it loaded all symbols.

moneromooo-monero commented 6 years ago

Alright, please try with that particular test (invalid_binary_format) disabled by commenting it out in tests/core_tests/chaingen_main.cpp.

nioroso-x3 commented 6 years ago

Ok, now it crashed after the "gen_bp_tx_invalid_borromean_type" test. Finally the log increased quite a lot.

core_tests.log.gz

moneromooo-monero commented 6 years ago

Nice, that's all of them except the one you commented :) The crash at the end is fixed in #4785, unrelated to endianness. For the remaining (invalid_binary_format), are you able to run with valgrind or ASAN ? With valgrind, you just prepend "valgrind " to your normal command line. With ASAN, you build monero with -D SANITIZE=ON on the cmake command line. ASAN is best if you can (much faster, detects more problems), but might not be available on your particular arch.

moneromooo-monero commented 6 years ago

BTW, if you want to run just one test, you can use --filter=regexp So here, --filter=\*invalid_binary_format\*

nioroso-x3 commented 6 years ago

Valgrind seems to complain a lot about invalid writes and reads in slow-hash. I ran this using the filter, so only the invalid_binary_format test is running.

core_tests_valgrind.log.gz

I also ran it without the filter, it also complains about the same lines.

core_tests_full_valgrind.log.gz

I can make an account on my powermac for a dev, it has 4 cores, 8gb of ram and a SSD, should be a lot faster than running a qemu vm.

moneromooo-monero commented 6 years ago

Try adding "--max-stackframe=4000000" to the valgrind command line. the Cryptonight stacks need to be large.

nioroso-x3 commented 6 years ago

Ok, core_tests crashed with segfault inside valgrind now, but much earlier.

core_tests_full_valgrind.log.gz

moneromooo-monero commented 6 years ago

Looks like some compiler or lib problem. Try adding "-D STACK_TRACE=OFF" to the cmake command line.

jtgrassie commented 6 years ago

Ubuntu 16, PowerPC BE, 32 bit. PRs: (#4796, #4726, #4689, #4781, #4757, #4755).

core_tests took too long so I bailed on that.

unit_tests failed. Looks like something to do with -fPIC:

unit_tests: error while loading shared libraries: R_PPC_REL24 relocation at 0x010f521c for symbol '_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc' out of range

All other tests passed 👍

Log

Going to run the core_tests again and leave running to see what that reports.

nioroso-x3 commented 6 years ago

I made 2 builds, one with D STACK_TRACE=OFF and a -D SANITIZE=ON build They are both running the inv format test, extremely slowly. core_tests_valgrind.log.gz core_tests_asan.log.gz

nioroso-x3 commented 6 years ago

Ubuntu 16, PowerPC BE, 32 bit. PRs: (#4796, #4726, #4689, #4781, #4757, #4755).

core_tests took too long so I bailed on that.

unit_tests failed. Looks like something to do with -fPIC:

unit_tests: error while loading shared libraries: R_PPC_REL24 relocation at 0x010f521c for symbol '_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc' out of range

All other tests passed

Log

Going to run the core_tests again and leave running to see what that reports.

Are you on a G5 too? I remember ubuntu 16.04 being quite buggy, thats why I switched to gentoo.

jtgrassie commented 6 years ago

@nioroso-x3 No. Annoyingly I got rid of my G5 last year (it was gathering serious dust!). Thus my tests have been using qemu-system-ppc. It's been pretty stable with Ubuntu on it, although of course, very slow.

hegjon commented 6 years ago

@nioroso-x3 No. Annoyingly I got rid of my G5 last year (it was gathering serious dust!). Thus my tests have been using qemu-system-ppc. It's been pretty stable with Ubuntu on it, although of course, very slow.

Can you paste your public SSH key? I can try to get access to a PPC machine for you on the Fedora infrastructure.

jtgrassie commented 6 years ago

Here is the failing core_tests log (Ubuntu 16 PowerPC BE 32bit). LastTest.log.tar.gz

nioroso-x3 commented 6 years ago

core_tests also gets stuck in Fedora 25 ppc64. That also uses gcc 6.4, I'll test the newest gcc just in case

nioroso-x3 commented 6 years ago

Core tests passes completely when using llvm3.9 on fedora 25 and llvm (clang) 7.0 in gentoo, looks like gcc is buggy for ppc64 lol

First log is for gentoo in release, second for fedora in debug, looks like at the end there is a double free error, but everything passes for core_tests.

core_tests_llvm7_release.log.gz

core_tests_llvm39_debug.log.gz

make_f25_llvm39.log.gz

moneromooo-monero commented 5 years ago

And now... does it sync the blockchain ? :)

nioroso-x3 commented 5 years ago

Nope, its not syncing. bitmonero_gentoo.log.gz

moneromooo-monero commented 5 years ago

https://github.com/monero-project/monero/pull/4866

nioroso-x3 commented 5 years ago

New bitmonero log after 4866 What does that patch fix?

bitmonero.tar.gz

moneromooo-monero commented 5 years ago

It fixes values read/written from/to the network differently on little endian and big endian archs.

moneromooo-monero commented 5 years ago

And I see at least another one that needs fixing.

moneromooo-monero commented 5 years ago

I updated 4866,

nioroso-x3 commented 5 years ago

New log bitmonero.tar.gz

moneromooo-monero commented 5 years ago

I found more places that need endian fixing. I'll post when I've fixed all I see.

moneromooo-monero commented 5 years ago

4866 updated again.

nioroso-x3 commented 5 years ago

New log, also unit_tests is getting stuck after mnemonics test, core_tests passes.

bitmonero.log.gz unit_tests.log.gz

moneromooo-monero commented 5 years ago

We can receive packet :) Looks like the payload is also endian dependent though. Not fun.

nioroso-x3 commented 5 years ago

Will this bug be fixed? I'm willing to provide ssh access to a machine for testing.

moneromooo-monero commented 5 years ago

I can debug as a background task if I have access to such a machine.

nioroso-x3 commented 5 years ago

Post a ssh public key, I can give you access to my G5 with gentoo. It has clang-8 and gcc-8.2. You'll have access at monerodevs@nerv-la.ddns.net:223

moneromooo-monero commented 5 years ago

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDEGd0x3Tkn/Ht1gKZlQY2T0oEpPEenGGPqzPMHMvHJ8S/PLbkVAFfNLDuBdshnm3r/4eMYspBO8/Pa55ICrURwhLk/aQ5vuNwvoReSib5omItheNM5ALWZpVfNTBZct1raryBIaDOUn9SvfLhZzhKojRSrFF4P5Nitn4aMjcGiKklIdFluQ0cIOmA4yY2DY8x6NPECVtPsJrwc89CMlPtlXNd8TgAWy8PvEQb7H9T6XaW4Mn1fGwT52+70q/Eyo4iNrGuLx74obvtAd3nCugTJykE1dXIiQQ3FtmtPqZCOQfaAVteKWvUPWYs4yc+b7LCqf06YvFhw+FfkS04F0gV user@host

nioroso-x3 commented 5 years ago

You should have access now.

nioroso-x3 commented 5 years ago

PPC64le (little endian) is failing some tests: Test project /monero/build/Linux/master/debug Start 1: hash-target 1/19 Test #1: hash-target ...................... Passed 2.23 sec Start 2: core_tests 2/19 Test #2: core_tests .......................Failed 12970.89 sec Start 3: cncrypto 3/19 Test #3: cncrypto ......................... Passed 19.66 sec Start 4: cnv4-jit 4/19 Test #4: cnv4-jit ......................... Passed 1210.97 sec Start 5: unit_tests 5/19 Test #5: unit_tests .......................Failed 896.62 sec Start 6: difficulty 6/19 Test #6: difficulty ....................... Passed 0.09 sec Start 7: wide_difficulty 7/19 Test #7: wide_difficulty ..................***Failed 0.03 sec Start 8: block_weight 8/19 Test #8: block_weight ..................... Passed 111.12 sec Start 9: hash-fast 9/19 Test #9: hash-fast ........................ Passed 0.06 sec Start 10: hash-slow 10/19 Test #10: hash-slow ........................ Passed 0.62 sec Start 11: hash-slow-1 11/19 Test #11: hash-slow-1 ...................... Passed 0.69 sec Start 12: hash-slow-2 12/19 Test #12: hash-slow-2 ...................... Passed 1.71 sec Start 13: hash-slow-4 13/19 Test #13: hash-slow-4 ...................... Passed 5.99 sec Start 14: hash-tree 14/19 Test #14: hash-tree ........................ Passed 0.02 sec Start 15: hash-extra-blake 15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec Start 16: hash-extra-groestl 16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec Start 17: hash-extra-jh 17/19 Test #17: hash-extra-jh .................... Passed 0.03 sec Start 18: hash-extra-skein 18/19 Test #18: hash-extra-skein ................. Passed 0.02 sec Start 19: hash-variant2-int-sqrt 19/19 Test #19: hash-variant2-int-sqrt ........... Passed 473.87 sec

I couldnt find the .log for the wide-difficulty test, what is the filename? core_and_unit_tests.zip

nioroso-x3 commented 5 years ago

hash-slow-2 and hash-slow-4 are failing in big endian ppc64 Test project /home/jribeiro/Development/monero-ori/build/Linux/master/debug
Start 1: hash-target
1/19 Test #1: hash-target ...................... Passed 2.34 sec
Start 2: core_tests
2/19 Test #2: core_tests .......................Failed 686.95 sec
Start 3: cncrypto
3/19 Test #3: cncrypto ......................... Passed 41.94 sec
Start 4: cnv4-jit
4/19 Test #4: cnv4-jit ......................... Passed 2062.62 sec
Start 5: unit_tests
5/19 Test #5: unit_tests .......................
Failed 609.90 sec
Start 6: difficulty
6/19 Test #6: difficulty ....................... Passed 0.25 sec
Start 7: wide_difficulty
7/19 Test #7: wide_difficulty .................. Passed 38.04 sec
Start 8: block_weight
8/19 Test #8: block_weight ..................... Passed 184.81 sec
Start 9: hash-fast
9/19 Test #9: hash-fast ........................ Passed 0.23 sec
Start 10: hash-slow
10/19 Test #10: hash-slow ........................ Passed 1.37 sec
Start 11: hash-slow-1
11/19 Test #11: hash-slow-1 ...................... Passed 1.80 sec
Start 12: hash-slow-2
12/19 Test #12: hash-slow-2 ......................Failed 6.17 sec
Start 13: hash-slow-4
13/19 Test #13: hash-slow-4 ......................
Failed 10.52 sec
Start 14: hash-tree
14/19 Test #14: hash-tree ........................ Passed 0.20 sec
Start 15: hash-extra-blake
15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec
Start 16: hash-extra-groestl
16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec
Start 17: hash-extra-jh
17/19 Test #17: hash-extra-jh .................... Passed 0.04 sec
Start 18: hash-extra-skein
18/19 Test #18: hash-extra-skein ................. Passed 0.04 sec
Start 19: hash-variant2-int-sqrt
19/19 Test #19: hash-variant2-int-sqrt ........... Passed 1222.28 sec

core_and_unit_tests_be.zip

moneromooo-monero commented 5 years ago

It should all be in LastTest.log

moneromooo-monero commented 5 years ago

https://github.com/monero-project/monero/pull/5544

moneromooo-monero commented 5 years ago

Thanks much for the G5 access. The patch above fixes most issues. There's still a failure in serialization unit tests, which I think is due to using boost code that's not endianness nice (not 100% sure). I think all the rest is fixed (but it takes massive amounts of time to build/test on that G5 so I've not run a full test run).

moneromooo-monero commented 5 years ago

The serialization test failure is now also fixed, same PR.