Closed vielmetti closed 7 years ago
test fails; looking into this:
root@docker-build-test:~/src/nats-io/gnatsd# go build
root@docker-build-test:~/src/nats-io/gnatsd# go test ./...
? nats-io/gnatsd [no test files]
? nats-io/gnatsd/auth [no test files]
ok nats-io/gnatsd/conf 0.018s
ok nats-io/gnatsd/logger 0.618s
ok nats-io/gnatsd/server 24.935s
ok nats-io/gnatsd/server/pse 0.104s
--- FAIL: TestServerRestartReSliceIssue (10.01s)
panic: Unable to start NATS Server in Go Routine [recovered]
panic: Unable to start NATS Server in Go Routine
goroutine 44 [running]:
panic(0x8154a0, 0x482000b7e0)
/usr/lib/go-1.6/src/runtime/panic.go:481 +0x384
testing.tRunner.func1(0x4820250870)
/usr/lib/go-1.6/src/testing/testing.go:467 +0x168
panic(0x8154a0, 0x482000b7e0)
/usr/lib/go-1.6/src/runtime/panic.go:443 +0x4b4
nats-io/gnatsd/test.RunServerWithAuth(0x482027c3c0, 0x0, 0x0, 0xffff9c66e110)
/root/src/nats-io/gnatsd/test/test.go:102 +0x180
nats-io/gnatsd/test.RunServerWithConfig(0x9c72f0, 0x14, 0x0, 0x482027c3c0)
/root/src/nats-io/gnatsd/test/test.go:79 +0x2a4
nats-io/gnatsd/test.runServers(0x4820250870, 0x0, 0x0, 0x0, 0x0)
/root/src/nats-io/gnatsd/test/cluster_test.go:66 +0x4c
nats-io/gnatsd/test.TestServerRestartReSliceIssue(0x4820250870)
/root/src/nats-io/gnatsd/test/client_cluster_test.go:17 +0x3c
testing.tRunner(0x4820250870, 0xbff288)
/usr/lib/go-1.6/src/testing/testing.go:473 +0xbc
created by testing.RunTests
/usr/lib/go-1.6/src/testing/testing.go:582 +0x65c
FAIL nats-io/gnatsd/test 11.477s
? nats-io/gnatsd/util [no test files]
? nats-io/gnatsd/vendor/github.com/nats-io/nuid [no test files]
? nats-io/gnatsd/vendor/golang.org/x/crypto/bcrypt [no test files]
? nats-io/gnatsd/vendor/golang.org/x/crypto/blowfish [no test files]
? nats-io/gnatsd/vendor/golang.org/x/sys/windows [no test files]
? nats-io/gnatsd/vendor/golang.org/x/sys/windows/registry [no test files]
Run from command line works just fine - at least the server comes up.
Is there a particularly good client you'd recommend to exercise the server, @kozlovic ? Happy to bash on it to see if I can trigger whatever this issue is.
Just realized that it worked for server
package.
Could you make sure that there is no gnatsd
running in the background and then do this just to check:
go test -race -v -p=1 ./...
The -p=1
will ensure that each package is run after the other. I am just wondering if there could be ports conflicts between the tests in different packages. We normally try to use different ports, and it works fine on Travis, but it could be just luck.
go test -race
is not available in Go 1.6.x on arm64
on Ubuntu.
With -p=1
I get a lot more tests to pass, but a few still fail, all related to TLS:
root@docker-build-test:~# grep FAIL gnats-test.out
--- FAIL: TestTLSConnz (1.12s)
--- FAIL: TestPingSentToTLSConnection (0.71s)
--- FAIL: TestTLSConnection (1.19s)
--- FAIL: TestTLSBadAuthError (1.11s)
FAIL
FAIL nats-io/gnatsd/test 73.991s
root@docker-build-test:~# go version
go version go1.6.3 linux/arm64
Looking in a little more detail, here are all of the error messages:
root@docker-build-test:~# grep "version 4552" gnats-test.out
monitor_test.go:337: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
test.go:128: Error writing command to conn: tls: received record with version 4552 when expecting version 303
tls_test.go:44: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
tls_test.go:252: Excpected and auth violation, got tls: received record with version 4552 when expecting version 303
You may want to try with a newer version of Go, just to make sure.
Oh, that's because the timeouts are too small.
Let me see in which place you would have to increase this timeout to make sure that's only that.
Two things you could try:
go test -v -run=TestTLSConnz ./test
server/server.go:591
:
//ttl := secondsToDuration(s.opts.TLSTimeout)
ttl := 10*time.Second
And server/client.go:1244
:
// c.atmr = time.AfterFunc(d, func() { c.authTimeout() })
c.atmr = time.AfterFunc(10*time.Second, func() { c.authTimeout() })
Could you please try and report back?
Single test still fails:
root@docker-build-test:~/src/github.com/nats-io/gnatsd# go test -v -run=TestTLSConnz ./test
=== RUN TestTLSConnz
--- FAIL: TestTLSConnz (1.11s)
monitor_test.go:337: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
FAIL
exit status 1
FAIL github.com/nats-io/gnatsd/test 1.134s
My version of Go is 1.6.3 which is older than the one you recommend; I'll report back separately testing under Go 1.8.
root@docker-build-test:~/src/github.com/nats-io/gnatsd# go version
go version go1.6.3 linux/arm64
When you ran the test, have you override the timeouts? For that test specifically, if you do not want to tweak the code, you can modify the config file used in this test:
test/configs/tls.conf
Change both timeout
values in this file to 10
instead of 2
and 1
.
Wtih longer timeouts, the 10 second times patched in above into client.c and server.c, we pass a test:
=== RUN TestTLSConnz
--- PASS: TestTLSConnz (2.25s)
PASS
ok github.com/nats-io/gnatsd/test 2.269s
minio
has some accelerated crypto routines which should speed up TLS, if that timeout is due to slow performance.
Ok, now the problem re-running the whole test suite with the override is that you may then get some test failures because the test expect the timeout to occur say within 2 seconds. But we should be able to figure out if that's the case based on the test name.
All the TLS tests now pass, but there's one test that fails:
=== RUN TestAuthClientNoConnect
--- FAIL: TestAuthClientNoConnect (3.03s)
test.go:128: Error reading from conn: read tcp 127.0.0.1:43868->127.0.0.1:
10422: i/o timeout
2 - /root/src/github.com/nats-io/gnatsd/test/auth_test.go:80
3 - /usr/lib/go-1.6/src/testing/testing.go:473
4 - /usr/lib/go-1.6/src/runtime/asm_arm64.s:975
The code in auth_test.go:80 reads
// This is timing dependent..
time.Sleep(server.AUTH_TIMEOUT)
Yes, like I said. So it means that the only failures you got were due to timeout. What surprises me is that you go the failures in the first place. Even with current values (sometimes as low as 0.5 is some config files), it works even when running the suite on Travis, which sometimes is way slower than when we run on our personal laptops. So it is a bit surprising considering the spec of your machine?
The timeouts are very surprising given the spec of the machine. I'm going to rebuild with Go 1.8 next, because I know I've seen speed improvements overall with that, and maybe that is enough to help.
With one failed test, I get this as an overall test time:
FAIL github.com/nats-io/gnatsd/test 80.181s
and it looks like the last log on Travis runs the same tests in
ok github.com/nats-io/gnatsd/test 68.013s
With Go 1.8 it fails a little faster
FAIL github.com/nats-io/gnatsd/test 78.930s
still failing in
--- FAIL: TestAuthClientNoConnect (3.03s)
I'm sure that's because Go is using software crypto on arm64, rather than the hardware instructions on the chip. The minio
code is at https://github.com/minio/sha256-simd which might help.
Nope, gnatsd doesn't use the sha256
code, but I was able to benchmark Go's crypto/tls
and found it wanting on arm64
. The open issue is
https://github.com/golang/go/issues/19840
I'll chase this upstream, for the moment let's mark this issue as "on hold", and I'll work to get a performance improvement.
Go 1.9 beta 1 is out and has a binary build for ARM64 (yay).
According to the referenced issue https://github.com/golang/go/issues/19840 the opportunity for this particular performance issue to be resolved in Go for ARM will come in the Go 1.10 timeframe. However there may be other performance improvements in Go 1.9 so that's worth a quick test.
Is it possible for you to summarise the state of the aarch64 server build, we are very interested in using it on our embedded aarch64 platform as a control plane enabler.
@salerio - what are the specs for your aarch64 platform? The concern expressed above was that some of the crypto instruction in Go on aarch64 are not hardware accelerated, and that the soft versions of the algorithms have poor performance on one system (Cavium ThunderX).
Its a Xilinx UltraScale+ MPSoC which has 4 x Cortex-A53 CPU complex. Although there are crypto accelerators in the SoC I doubt anyone (any standard software that is) will make sure of them yet as the part is very new.
See https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html
@ghost does this MPSoC from Xilinx have an FPGA in it?
Go 1.11beta1 is out, I would like to test performance with it.
We would be interested in what you find, keep us posted.
Thanks @derekcollison I have opened up #695 to address the question of "how do you test performance".
Thanks.
OS/Container environment:
ARMv8 server is a Packet 2A (Cavium ThunderX, 96-core at 2 Ghz)
Steps or code to reproduce the issue:
Expected result:
linux-arm64 supported release
Actual result:
No files found.
As of 2017-04-04, build works fine, tests fail until timeouts are extended, and we've identified a performance issue on ARMv8 Go 1.8
crypto/tls
. Further work pending Go performance improvements on ARMv8.Feature Requests
Use Case:
Two use cases: one for ARMv8 single-board computers (e.g. Raspberry Pi 3, Odroid C2, Pine64); another for ARMv8 in the data center (e.g. Cavium ThunderX).
Proposed Change:
Build and test for arm64, validate that it works, add as supported release.
Who Benefits From The Change(s)?
Users of arm64 (ARMv8) platforms as listed above.
Alternative Approaches
Planning to build from source and see how that goes; I'll use this issue to identify anything that comes up.