Open gongsu832 opened 8 years ago
Forgot to mention the code version:
commit 35326c25f99b038286a58330fdef87d23fe5f473
Merge: 0d05bf3 a626e43
Author: Binh Q Nguyen <binhn@us.ibm.com>
Date: Sat Jun 18 08:57:17 2016 -0400
Merge pull request #1877 from jyellick/keep-state-if-can-execute
Stabilize PBFT under stress with periodic viewchange
Same issue observed in #1886
I can't reproduce on my vagrant box . @gongsu832 could you run behave -D logs=y
and attach the container logs ?
@tuand27613 The latest commit 72a7cbf9d3f49ee79d71c494f8aef916b7376251 now adds an additional dependency behave-grpc to target behave-deps, which runs the command sudo pip install -q 'grpcio==0.13.1'. It fails on zLinux (both Debian 8 and RHEL 7.2) with a message similar to the following:
# pip install -q 'grpcio==0.13.1'
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-UkqENb/grpcio/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-86V1pw-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-UkqENb/grpcio/
Can I temporarily skip this dependency (by touching build/behave/.grpc-dummy)? Thanks.
@tuand27613 Removing "-q" in the pip install command yields more output. When compiling grpcio-0.13.1.tar.gz, it failed with:
In file included from ./third_party/boringssl/include/openssl/asn1.h:68:0,
from ./third_party/boringssl/include/openssl/rsa.h:62,
from ./src/core/security/json_token.h:38,
from ./src/core/security/credentials.h:43,
from src/core/security/client_auth_filter.c:43:
./third_party/boringssl/include/openssl/bn.h:161:2: error: #error "Must define either OPENSSL_32_BIT or OPENSSL_64_BIT"
#error "Must define either OPENSSL_32_BIT or OPENSSL_64_BIT"
^
./third_party/boringssl/include/openssl/bn.h:222:44: error: unknown type name 'BN_ULONG'
Looks like s390x (unsurprisingly) isn't recognized as OPENSSL_64_BIT.
@tuand27613 I reverted back to 35326c25f99b038286a58330fdef87d23fe5f473 so I can run the behave tests. Here are the behave run log along with container logs.
@gongsu832 is there any way to get this to install properly on s390x? FYI @rameshthoomu
@jeffgarratt I looked at this briefly. I downloaded boringssl. Fixing the OPENSSL_64_BIT and getting it to compile is easy. The problem is that when I run the tests that come with boringssl, several fail (most likely due to endian issue). So fixing boringssl will probably take some nontrivial amount of time. After that, getting grpcio to pick up the fixed boringssl is another hurdle.
@vpaprots also observed the same issue while installing grpcio package.
I am encountering issues installing grpcio on both z systems. I will continue to look at it but wanted to update the issues with current results.
pip install grpcio
On 148.100.105.200 File "build/bdist.linux-s390x/egg/setuptools/command/build_ext.py", line 187, in build_extension _build_ext.build_extension(self, ext) File "/usr/lib64/python2.7/distutils/command/build_ext.py", line 498, in build_extension depends=ext.depends) File "/usr/lib64/python2.7/distutils/ccompiler.py", line 574, in compile self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) File "/usr/lib64/python2.7/distutils/unixccompiler.py", line 132, in _compile raise CompileError, msg CompileError: command 'gcc' failed with exit status 1
On 148.100.107.97 Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ivQZoN/grpcio/
@jkirke : Below is the command you have to use to install grpcio package:
pip install -U 'grpcio==0.13.1'
Same result using that command.
Created new issue #1956: please log your comments in the new issue..
As I mentioned above in response to @jeffgarratt , boringssl (which grpcio depends on) is just not written with big endian in mind. I tracked down one of the ec_test failures to the file crypto/ec/p256-64.c. This is an excerpt from the beginning of the file:
/* bin32_to_felem takes a little-endian byte array and converts it into felem
* form. This assumes that the CPU is little-endian. */
static void bin32_to_felem(felem out, const u8 in[32]) {
out[0] = *((const u64 *)&in[0]);
out[1] = *((const u64 *)&in[8]);
out[2] = *((const u64 *)&in[16]);
out[3] = *((const u64 *)&in[24]);
}
/* smallfelem_to_bin32 takes a smallfelem and serialises into a little endian,
* 32 byte array. This assumes that the CPU is little-endian. */
static void smallfelem_to_bin32(u8 out[32], const smallfelem in) {
*((u64 *)&out[0]) = in[0];
*((u64 *)&out[8]) = in[1];
*((u64 *)&out[16]) = in[2];
*((u64 *)&out[24]) = in[3];
}
As you can see, the code is specifically assuming the CPU is little endian, which is rather strange. You'd expect something better from google. I fixed this particular case so now the ec_test proceeds further but is still failing in other places.
Until boringssl is properly fixed for big endian (and assuming there is no other package that grpcio depends on has similar endian problems), even if you can manage to get grpcio installed (i.e., compiled) on big endian, it's not going to work properly.
OK I fixed boringssl so it passes all tests on zLinux and I managed to install grpcio 0.13.1. The behave tests now fail on a different scenario:
peer_basic.feature:1097 verify reconnect of disconnected peer, issue #1851 -- @1.1 Composition options
All tests pass on 2 CPUs. Logs attached.
PS. @tuand27613 This is in the behave run log:
['Starting', 'vp0', '...']
['ESC[1AESC[2K']
['Starting', 'vp0', '...', 'done']
['ESC[1B']
Containers started:
['bddtests_vp0_1']
It appears that one of the container failed to start. But I thought you fixed this problem a while ago.
Good news on getting passed the grpcio 0.13.1 issue. What do I need to do to get this updated on the zLinux build systems?
@gongsu832 , looks like both containers are up, or at least behave thinks [vp1, vp0] are up Can you try turning DoNotDecompose , run only the @issue_1951 test and see what docker says ?
['Starting', 'vp0', '...']
['[1A[2K']
['Starting', 'vp0', '...', 'done']
['[1B']
Containers started:
['bddtests_vp0_1']
dockerComposeService = vp0
container bddtests_vp0_1 has env = ['CORE_PEER_ID=vp0', 'CORE_LOGGING_LEVEL=DEBUG', 'CORE_PEER_DISCOVERY_TOUCHPERIOD=1s', 'CORE_VM_ENDPOINT=http://172.17.0.1:2375', 'CORE_PEER_ADDRESSAUTODETECT=true', 'CORE_PEER_DISCOVERY_PERIOD=1s', 'PATH=/opt/go/bin:/opt/gopath/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'LD_LIBRARY_PATH=/opt/rocksdb:', 'GOROOT=/opt/go', 'GOPATH=/opt/gopath']
After starting, the container service list is = ['vp1', 'vp0']
Requesting path = http://172.17.0.3:5000/network/peers
@gongsu832 @tuand27613 Hey @gongsu832, please let me know if I can assist. Perhaps we can do a hangout and I can help you troubleshoot.
@ghaskins @rameshthoomu @jkirke @gongsu832 I think we are good to go on accepting this PR as it appears @gongsu832 and @jkirke will be able to resolve z related issues. @rameshthoomu agrees with this assessment, please let me know if any other concerns @ghaskins . Thanks.
@tuand27613 uncomment DoNotDecompose for @issue_1851:
Requesting path = http://172.17.0.2:5000/network/peers
After stoping, the container service list is = ['vp1']
Requesting path = http://172.17.0.3:5000/network/peers
']
['Starting', 'vp0', '...', 'done']
['']
Containers started:
['bddtests_vp0_1']
dockerComposeService = vp0
container bddtests_vp0_1 has env = ['CORE_PEER_ID=vp0', 'CORE_LOGGING_LEVEL=DEBUG', 'CORE_PEER_DISCOVERY_TOUCHPERIOD=1s', 'CORE_VM_ENDPOINT=http://172.17.0.1:2375', 'CORE_PEER_ADDRESSAUTODETECT=true', 'CORE_PEER_DISCOVERY_PERIOD=1s', 'PATH=/opt/go/bin:/opt/gopath/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'LD_LIBRARY_PATH=/opt/rocksdb:', 'GOROOT=/opt/go', 'GOPATH=/opt/gopath']
After starting, the container service list is = ['vp1', 'vp0']
Requesting path = http://172.17.0.3:5000/network/peers
And indeed both containers are running:
root@debian2:/opt/openchain/src/github.com/hyperledger/fabric/bddtests# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fa119e85b9d6 hyperledger/fabric-peer "peer node start" About a minute ago Up About a minute bddtests_vp1_1
a55f3a38a3f6 hyperledger/fabric-peer "peer node start" 2 minutes ago Up About a minute bddtests_vp0_1
Yet the scenario still fails with the same message:
2016/06/25 04:30:10 grpc: ClientConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 172.17.0.2:30303: getsockopt: connection refused"; Reconnecting to "vp0:30303"`
in vp1 log. If you want to take a look yourself, you can ssh root@debian2.watson.ibm.com without password. The machine is the one you used before and has your public key. Hyperledger code is under /opt/openchain/src/github.com/hyperledger/fabric.
@jkirke I'm maintaining a fork of grpc so that it can picked the fixed boringssl. To install on zLinux, pick a directory where you want to clone and do the following:
# git clone https://github.com/gongsu832/grpc.git
# cd grpc
# git submodule update --init
# pip install -rrequirements.txt
# git checkout tags/release-0_13_1
# GRPC_PYTHON_BUILD_WITH_CYTHON=1 pip install .
Thank you. I tried this out this morning on both build systems. They both failed to install with the following:
python_build/temp.linux-s390x-2.7/third_party/boringssl/crypto/bytestring/asn1_compat.o -fvisibility=hidden -pthread -std=gnu99
gcc: error: third_party/boringssl/crypto/bytestring/asn1_compat.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
creating tmp
creating tmp/tmp47U7QF
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -march=z196 -mtune=zEC12 -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -march=z196 -mtune=zEC12 -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c /tmp/tmp47U7QF/a.c -o tmp/tmp47U7QF/a.o
Traceback (most recent call last):
File "build_ext
step:\n{}".format(formatted_exception))
commands.CommandError: Failed build_ext
step:
Traceback (most recent call last):
File "/tmp/pip-2eIdKr-build/src/python/grpcio/commands.py", line 254, in build_extensions
build_ext.build_ext.build_extensions(self)
File "/usr/lib64/python2.7/distutils/command/build_ext.py", line 448, in build_extensions
self.build_extension(ext)
File "build/bdist.linux-s390x/egg/setuptools/command/build_ext.py", line 187, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/lib64/python2.7/distutils/command/build_ext.py", line 498, in build_extension
depends=ext.depends)
File "/usr/lib64/python2.7/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/usr/lib64/python2.7/distutils/unixccompiler.py", line 132, in _compile
raise CompileError, msg
CompileError: command 'gcc' failed with exit status 4
----------------------------------------
Command "/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-2eIdKr-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-a2YFnO-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-2eIdKr-build/ [root@dmlinux1rhel72 grpc]#
@jkirke Did you do git checkout tags/release-0_13_1?
I thought I did but I must have missed that step. Sorry for the false alarm. Both build systems now have grpc. Thank you for your help.
@jeffgarratt @ramesh see gongsu's comment above.
Is this a docker config issue ?
@tuand27613 I recreated all the fabric-* images and now the test passes! Grrrh, I hate docker! :-)
Well, that is good news, sort of.
Description
On a zLinux guest (debian 8) with 4 CPUs, behave test fails for either "peer_basic.feature:846 chaincode example02 with 4 peers and 1 membersrvc, test crash fault -- @1.1 Consensus Options" or "peer_basic.feature:968 chaincode example02 with 4 peers, two stopped". All tests pass with 2 CPUs (after taking 2 CPUs offline). Full logs for the two failures attached.
behave.zip
Describe How to Reproduce
make behave with 4 CPUs.