libbitcoin / libbitcoin-system

Bitcoin Cross-Platform C++ Development Toolkit
https://libbitcoin.info/
Other
1.28k stars 377 forks source link

Test failures on big endian architecture. #446

Open xnox opened 7 years ago

xnox commented 7 years ago

s390x is 64bit big endian. test.txt

pmienk commented 7 years ago

Assume fixed by #445, thanks!

xnox commented 7 years ago

Nope, it is not =) #445 is compile failure. This is the test-suite failure that one observes after applying #445 fix.

narodnik commented 7 years ago

Is it possible to get safe access to a VPS with this arch to fix this issue? This looks like an endianness issue. See the errors like [7f8000 != ff8000] and [817f != 017f] Thanks.

pmienk commented 7 years ago

I assume it's in large part due to a lack of conditionality in the endian functions. I didn't see any simple way to get travis-ci to cover big endian systems and don't have one myself. Was hoping to make a checkin in the next day or two and see if it addresses the issue.

On Sun, Aug 21, 2016, 1:19 AM RojavaCrypto notifications@github.com wrote:

Is it possible to get safe access to a VPS with this arch to fix this issue? This looks like an endianness issue. See the errors like [7f8000 != ff8000] and [817f != 017f] Thanks.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/libbitcoin/libbitcoin/issues/446#issuecomment-241244988, or mute the thread https://github.com/notifications/unsubscribe-auth/AGL5OG1Ca46M74ox5N9QXMBDrpbmeJYwks5qiAn9gaJpZM4Jof3Y .

xnox commented 7 years ago

@RojavaCrypto i am not aware of any public access to any big endian systems. E.g. powerpc (32bit, big endian) or the s390x (64bit, big endian). Whilst arm64 is wide spread, and hardware has support for arm64be - nobody has a linux port for arm64be, everyone does little endian only.

pmienk commented 7 years ago

It appears I was wrong as to the involvement of the to and from endian functions as for the most part they rely on bit shift operators which should provide platform independence. I've corrected what looks to be an oversight I made there w.r.t. endianness, but doubt it has much to do with the failures seen.

evoskuil commented 7 years ago

Looking at the test failures, these are the most interesting:

test/utility/endian.cpp(98): fatal error: in "endian_tests/from_little_endian_stream_unsafe_eof_stream_partial_read": critical check expected == value has failed
test/utility/endian.cpp(128): fatal error: in "endian_tests/from_little_endian_stream_unsafe_valid": critical check expected == value has failed

Notice that all of the BE tests succeed, as do some of the LE tests. Unfortunately Boolean boost macros are used in these two tests, instead of integer value macros, so we can't see detail of the failures. I've created a PR to address this issue and better factor and standardize the endianness tests.

One of the three LE stream tests succeeds but it is a negative test with no value comparison - the output of its from_little_endian_stream_unsafe call is unused. Therefore the common denominator of the endian test failures is from_little_endian_stream_unsafe on BE architecture. Given that LE streaming is used extensively it could be the cause of some (or all) of the other failures.

evoskuil commented 7 years ago

Looking at recent history, it looks like the immediately-above issue has been addressed by 8c895228f3e7ef971119ab5dfd29c71c3aa7e2fc. This should resolve the two endian test failures and certainly a number of others. @xnox could you please re-execute the test pass with latest code?

evoskuil commented 7 years ago

Based on the objectives of simplicity and full test coverage, the SHA1 and RIPEMD160 implementations have been replaced with architecture-independent implementations. Given that tests in these implementations were failing in this platform, this should also provide resolution for that part of the issue.

https://github.com/libbitcoin/libbitcoin/pull/462

evoskuil commented 7 years ago

A problem of invalid (overflowing) test data in the script_number implementation has been resolved by https://github.com/libbitcoin/libbitcoin/pull/459. The implementation has been modified to handle the full int64_t domain and to throw when that domain is over/under-flowed. The previous behavior was assertion-based and is documented to vary by architecture. It is likely that a large number of test failures resulted from this issue on this platform.

evoskuil commented 7 years ago

There are no known outstanding test failures or identified issues pertaining to endianness. Given that we do not have a BE test platform we await feedback.

xnox commented 7 years ago

Testing with 4beb7cc64189633460b892902ab1c9c4024c3df6 i still see thousands of failures, and there are still failures in e.g. script tests... the en/decoding still doesn't appear to take into account endiannes, either the testsuite expected output (false negatives) or the code itself (true failures) Eg. test/math/script_number.cpp(82): error: in "script_number_tests/check_operators": check encode_base16((add).bytes) == encode_base16((scriptnum1 + num2).data()) has failed [7e01 != fe01

Will attach test.log in a moment.

xnox commented 7 years ago

test.txt

evoskuil commented 7 years ago

@xnox Thanks for re-running the BE tests.

The number of failures is not indicative of the number of code errors. The script_number failures all results from one test, whereby the test iterates through a large amount of reference data. We try to avoid these types of non-isolating tests but many remain.

The code does (and previously did) take into account endiness. The problem is that the BE implementations are not regularly tested, and untested code is broken code. There are two ways to account for endinaness:

(1) Make architecture-aware code for each supported architecture, detect the architecture at compile time and switch code paths. (2) Make architecture independent code.

The previous implementation was mostly architecture independent, but had two hash functions that were not. It was clear that, at least in this case, the architecture detection was not working. Given that this technique implies the code will not be tested on our current CI system, I changed out the two hash functions for architecture independent implementations. This was successful, there are no longer failures in test/math/hash.cpp , and all of the dependent test/wallet/ failures from the previous run are fixed.

There was also unintended architecture dependence in one of our endian functions, which was changed to architecture independent. These test/utility/endian.cpp test failures have been fixed in the latest run.

Consequently all of the previous test/chain/script.cpp failures have also been resolved. But we have somehow introduced a failure in sha256:

test/math/hash.cpp(66): fatal error: in "hash_tests/sha256_hash_test": ...

This is causing failures in all serializable types (including different failures than before in script.cpp).

I'm fairly certain that all of the remaining failures result from the (new) sha256 error and an error in script_number. We were not confident that the change to script_number in the last attempt would resolve the issue. We may have to alter its test cases in order to get more information.

evoskuil commented 7 years ago

It turns out that the sha256 test is dependent upon header.hash() and is therefore failing for the same reason as the serializable types. I've replaced the test with a clean test of functionality, which will resolve this. I believe that @pmienk has isolated the cause of the failures in the serializable types. This should just leave us with the issue in script_number.

xnox commented 7 years ago

Over at https://launchpad.net/ubuntu/+source/libbitcoin/2.11.0-1build1 the test-suite fails on arm64, armhf, ppc64el, s390x - which is a mix of big/little endian and 32bit/64bit platforms that are all non-x86. Seems like there is some x86-ish assumption somewhere. I'll figure out how to test/integrate these. I thought there were CI things that can hook into github and build stuff on non-x86.

evoskuil commented 7 years ago

@xnox The results show a test failure (not a compile failure) on three of the five little-endian (ppc64el) and bi-endian (arm64, armhf) platforms . This is a distinct issue from this compile issue, and probably not endianess-related. Without the test log we cannot resolve the issue, but I am aware of people building successfully for ARM. Again, untested means broken, and we don't currently have ARM or PPC in our test matrix. Please create a new issue for these.

The (remaining) powerpc and s390x failures are big-endian-related compile failures, but given these are in version 2.x that is unsurprising. We have been applying fixes to v3.x (master) only. In fact the error that you patched is the one exposed:

 #if SHA1_BIG_ENDIAN

Once we have the endianess issues resolved we may decide to backport to v2.x but presently all work is in 3.x.

It would be most helpful if you could provided s390x build output from the latest master. I am expecting to see only script_number failures at this point.

evoskuil commented 7 years ago

I just resolved an endian bug in master (v3). Conversion of hash values to uint256_t used a byte copy from the hash buffer to/from an array<8> of uint32_t. On big-endian architecture the assumed byte ordering is incorrect. This has been resolved by performing the conversions using our endian convertors.

Having reviewed the entire code base at this point I do not believe there to be any remaining endian bugs, but I have no platform to test on, so will leave this open.

evoskuil commented 7 years ago

@xnox Any change you could give us another test run on big endian. We made a last fix but haven't been able to validate it and are tagging the v3 release this week. Thanks!

xnox commented 7 years ago

@evoskuil a lot of tests still fail test.txt

evoskuil commented 7 years ago

@xnox Thanks for this feedback. We'll give it another look. This would be simple if we had a machine of our own to test on :/

pmienk commented 7 years ago

@xnox I augmented the logging. If you wouldn't mind, the output could be helpful and much appreciated.

xnox commented 7 years ago

@evoskuil i have none to give out =/ i'm not sure if there are any big endian emulators / virtual machines that one can run.

evoskuil commented 7 years ago

No worries, once we get this I think we'll be able to maintain it.

xnox commented 7 years ago

ce91d4f5edcbe88e918cf41ca1025d103bf18cbe

test.txt

pmienk commented 7 years ago

Recent check-ins should reduce failures to those exposed by elliptic_curve.cpp. They appear to be an issue within secp256k1. I've opened the following issue: bitcoin-core/secp256k1#442

evoskuil commented 7 years ago

Given the comments on the referenced issue it would be a good idea to execute the full libsecp256k1 test suite on the problem platform.

evoskuil commented 3 years ago

TODO: add a compiler warning due to lack of CI testing on BE architecture.

/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html

evoskuil commented 3 years ago

Add CI support via Travis: https://github.com/syscoin/syscoin/commit/8a4438f2173eafbc653900de35ed62d868ed9115

evoskuil commented 3 years ago

Note that the bug opened against libsecp256k1 was closed as 'spurious' given that the values passed in the independent tests are opaque. In other words, it's a test issue, which makes sense, and should be easily resolvable.

evoskuil commented 3 years ago

https://github.com/libbitcoin/libbitcoin-build/issues/250