Open GoogleCodeExporter opened 9 years ago
Does the issue go away if you use the regular Oracle JDK?
Original comment by hearn@google.com
on 30 Oct 2013 at 8:14
Just for the records: I don't see this issue on
- openjdk-6-jdk:amd64 6b27-1.12.6-1ubuntu2
- maven 3.0.4-6
(that's on Ubuntu Saucy which should be very similar to Debian testing)
Original comment by andreas....@gmail.com
on 30 Oct 2013 at 8:36
Tried with proprietary Sun JDK (did export JAVA_HOME=~/path/to/jdk) without
success. Also tried OpenJDK 6b27-1.12.6-1 and OpenJDK 7u21-2.3.9-5, same result.
I'm doing this at home and at work, different PCs, different hardware and even
architecture (amd64 at home, i686 at work). I'll check the test with debugger
but it would help to have a basic case for this particular assert.
Original comment by radioano...@gmail.com
on 30 Oct 2013 at 8:59
This test looks inherently suspicious, there's a Thread.sleep(15) call just
before the assert. Matt, is that right? It looks like you're trying to test
socket connect timeouts there, in which case I'd expect 15 seconds rather than
15 milliseconds.
Original comment by hearn@google.com
on 30 Oct 2013 at 10:58
The socket timeout set is 10 milliseconds, so it sleeps 15 to test the timeout.
That said, the test also fails for me unless I set it higher. It works for me
at 25ms, so Ill just set it to 100ms.
Original comment by BlueMatt...@gmail.com
on 30 Oct 2013 at 5:32
Doesn't work for me even with 100ms sleep. However, it passes without a problem
with 15ms sleep if I set the timeout for the client, too in line 181:
}, Protos.TwoWayChannelMessage.getDefaultInstance(), 1000, 10), 0);
Though that's probably not the point of the test, I think it should test
closing the server-side socket on timeout even if some malicious client doesn't
have a timeout at all. I run the test with JUnit from Eclipse and that's what I
get in the output:
30.10.2013 21:39:59 com.google.bitcoin.protocols.niowrapper.ProtobufParser
timeoutOccurred
WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest$4@29e97f9f
30.10.2013 21:39:59 com.google.bitcoin.protocols.niowrapper.ProtobufParser
timeoutOccurred
WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest$3$1@9c0ec97
30.10.2013 21:39:59 com.google.bitcoin.protocols.niowrapper.ProtobufParser
timeoutOccurred
WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest$5@25fa1bb6
30.10.2013 21:39:59 com.google.bitcoin.protocols.niowrapper.ConnectionHandler
handleKey
SEVERE: Error handling SelectionKey
java.lang.IllegalStateException: Message too large or length underflowed
at com.google.bitcoin.protocols.niowrapper.ProtobufParser.receiveBytes(ProtobufParser.java:153)
at com.google.bitcoin.protocols.niowrapper.ConnectionHandler.handleKey(ConnectionHandler.java:116)
at com.google.bitcoin.protocols.niowrapper.NioServer.handleKey(NioServer.java:56)
at com.google.bitcoin.protocols.niowrapper.NioServer.access$1(NioServer.java:47)
at com.google.bitcoin.protocols.niowrapper.NioServer$1.run(NioServer.java:85)
30.10.2013 21:39:59 com.google.bitcoin.protocols.niowrapper.NioClient$1 run
SEVERE: Error trying to open/read from connection
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:218)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
at com.google.bitcoin.protocols.niowrapper.NioClient$1.run(NioClient.java:68)
But all 4 tests in NioWrapperTest pass...
Original comment by radioano...@gmail.com
on 30 Oct 2013 at 5:45
It also passes if I set 0 timeout on the server (line 158) and 10 on the client
(line 179). So it seems like the server is passive for some reason and doesn't
close the connection or respect timeout at all. But if the client closes it
then it does the same, too. Weird.
My Eclipse also messed line numbers a bit but just in this test file.
Original comment by radioano...@gmail.com
on 30 Oct 2013 at 5:51
Can you disable the last three of the NioWrapperTests and provide the logs from
those without changing the test (except for maybe a really huge timeout).
Original comment by BlueMatt...@gmail.com
on 30 Oct 2013 at 6:59
Which tests exactly should I disable? If I leave only basicClientServerTest()
it completes without an error.
I also noticed that the server really doesn't close the connection, I've set a
breakpoint on that assert and after it stopped there I looked at netstat and
the connection to the port 4243 was there in ESTABLISHED state. So it's not
some state detection error, it stayed connected for real. The only suspended
thread was the main and client/server handlers are in their own threads so it
shouldn't be the cause.
Original comment by radioano...@gmail.com
on 30 Oct 2013 at 7:07
Leave only basicTimeoutTest.
Yea, lsof indicates that there are a few connections. One localhost->localhost
that I dont know what is, one that listens, two that connect from/to the listen
port (ie the incoming connection and outgoing connection). The two die after
the timeout as expected but the rest remain.
Original comment by BlueMatt...@gmail.com
on 30 Oct 2013 at 7:50
Ok, commented all @Test annotations except that near basicTimeoutTest. Still
the same:
NioWrapperTest
com.google.bitcoin.protocols.niowrapper.NioWrapperTest
basicTimeoutTest(com.google.bitcoin.protocols.niowrapper.NioWrapperTest)
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at com.google.bitcoin.protocols.niowrapper.NioWrapperTest.basicTimeoutTest(NioWrapperTest.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
It's line 184: assertTrue(clientConnection1Closed.isDone() &&
serverConnection1Closed.isDone());
Original comment by radioano...@gmail.com
on 30 Oct 2013 at 7:56
No "WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest" lines printed?
Original comment by BlueMatt...@gmail.com
on 30 Oct 2013 at 8:16
Yes, there's one:
31.10.2013 8:55:13 com.google.bitcoin.protocols.niowrapper.ProtobufParser
timeoutOccurred
WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest$3$1@159b5217
Original comment by radioano...@gmail.com
on 31 Oct 2013 at 4:55
Any test that has a 5 msec failure window is never going to be stable. Many OS'
have timeslices up in the 20-30msec range. Sleeps in unit tests are so often
flaky. It'd be better to rewrite the test so there are no sleeps, instead that
the thread blocks until the timeout close occurs, and maybe (!) measure the
amount of time it actually took to ensure a general zone of sanity.
Original comment by hearn@google.com
on 31 Oct 2013 at 10:39
Accidentally replied to the email instead of here...
I swapped out two of the sleeps with just general waits, and
made the one that checks that the connection doesn't close when there is
no timeout sleep for 10* the amount of time it takes to close the
socket. As to why this test is still failing for the reporter, Im not sure.
Can you retest with the fixes at
https://code.google.com/r/bluemattme-bitcoinj/source/list ?
Original comment by BlueMatt...@gmail.com
on 31 Oct 2013 at 8:14
As I expected, it hangs forever on basicTimeoutTest. There's the output:
01.11.2013 0:17:31 com.google.bitcoin.protocols.niowrapper.ProtobufParser
timeoutOccurred
WARNING: Timeout occurred for
com.google.bitcoin.protocols.niowrapper.NioWrapperTest$3$1@6c3c9c31
http://i.imgur.com/jaMwpSs.png threads are running and nothing happens. Here's
a netstat:
tcp6 0 0 127.0.0.1:4243 :::* LISTEN
1232/java
tcp6 0 0 127.0.0.1:4243 127.0.0.1:38380 ESTABLISHED
-
tcp6 0 0 127.0.0.1:38380 127.0.0.1:4243 ESTABLISHED
1232/java
Original comment by radioano...@gmail.com
on 31 Oct 2013 at 8:20
I was able to reproduce in a Debian-testing VM:
Indeed it appears the call to channel.close() in ConnectionHandler does not
actually close the underlying connection so the client just sits, waiting on
TCP to read data.
AFAICT, this is a JVM bug.
The good news is this bug disappears with the same test on my nonetty branch,
though I cant seem to figure out why (rewrite of the P2P network stack to use
the niowrapper stack as a base). That said, there are still a few bugs to iron
out of the nonetty branch as well as plenty of review before it can be merged.
Original comment by BlueMatt...@gmail.com
on 1 Nov 2013 at 3:26
That's odd. I thought we established already that it's not JVM related,
radioanonzoi said he tried the binary Oracle JVM from java.com and it failed in
the same way.
Original comment by hearn@google.com
on 1 Nov 2013 at 9:49
AFAIK, OpenJDK and Oracle JVM share the same code mostly except some very
specific classes. It's really odd that it's not reproducible on Ubuntu with the
same OpenJDK version. Probably it's a bug in some underlying Debian libs. There
are differences in ProtobufServer and ProtobufClient. Server has two channels
in vars named "channel" and "sc" and client only has "sc" and it's much simpler
overall. It also seems that sc.close() in ProtobufServer never gets called in
basicTimeoutTest() so I've added it to closeConnection() after channel.close()
but nothing changed, the connection stays in established state anyway.
Original comment by radioano...@gmail.com
on 1 Nov 2013 at 11:03
They differ in more ways than you might expect. You *did* try the real binary
Oracle JVM, right? I just want to get to the bottom of the inconsistency here.
The other possibility is that Debian has been patching OpenJDK and broken it.
They have a long track record of such things.
Original comment by hearn@google.com
on 1 Nov 2013 at 11:52
Yes, doublechecked. I've downloaded jdk-7u45-linux-i586.tar.gz, unpacked it and
set JAVA_HOME envvar to the path of the unpacked version. Maven supports this
as said in /usr/bin/mvn script. mvn -version showed that exact path in "Java
home:" line and java version is "Java version: 1.7.0_45, vendor: Oracle
Corporation". When I unset that var I see "Java version: 1.6.0_27, vendor: Sun
Microsystems Inc." So I did "mvn test" having that var set and get the failure.
BlueMatt.me confirmed this and he can also download the proprietary JDK to test
against it just to be sure.
Original comment by radioano...@gmail.com
on 1 Nov 2013 at 12:08
OK. Then I'm going to rename this bug to reflect the fact that it seems to be
Debian specific somehow. Once we merge nonetty, hopefully we can close it. For
now just skipping the tests is the right fix.
Original comment by hearn@google.com
on 1 Nov 2013 at 12:38
fyi: An easy way to disable a test for some time is annotate it with
@org.junit.Ignore
You could keep this on your local git branch until the issue is resolved.
Original comment by andreas....@gmail.com
on 1 Nov 2013 at 12:53
Original comment by hearn@google.com
on 1 Nov 2013 at 12:55
This issue was closed by revision f2678463be95.
Original comment by hearn@google.com
on 1 Nov 2013 at 2:41
Not fixed, IIUC.
Original comment by hearn@google.com
on 1 Nov 2013 at 2:42
Correct, not fixed. Now it hangs on line 184 infinitely for me.
Original comment by radioano...@gmail.com
on 1 Nov 2013 at 2:46
For the record, I've successfully compiled the library after adding @Ignore to
basicTimeoutTest. No more issues.
Original comment by radioano...@gmail.com
on 1 Nov 2013 at 2:49
Could you try with git master and see if the issue persists?
Original comment by hearn@google.com
on 12 Dec 2013 at 3:06
It freezes now on some test. http://pastebin.com/fX3WnhHr — last messages
from stdout. I waited for solid 30 minutes and nothing happened after this
point.
Original comment by radioano...@gmail.com
on 12 Dec 2013 at 3:37
I'm just starting to build the project from the 0.11 branch on both Ubuntu and
Windows and I'm seeing similar "Address already in use" failures when running
the tests.
In both cases it's JDK7, Maven 3.0.x - I'll post more information once I
collect further information and isolate the problem. I'll also switch to
master and try there as well.
Original comment by jerry.br...@gmail.com
on 26 Mar 2014 at 6:33
Thanks!
Original comment by mh.in.en...@gmail.com
on 26 Mar 2014 at 6:37
On master, the test results were improved, but most of the tests in PeerTest
are still seeing errors. They all appear to be nio bind errors - "Address
already in use".
Attaching the test results while I dig in to see if I can get it resolved.
Others have posted that nio has issues on some distros but I think there may be
something else going on that we could clean up.
Original comment by jerry.br...@gmail.com
on 26 Mar 2014 at 11:24
Attaching file.
Original comment by jerry.br...@gmail.com
on 26 Mar 2014 at 11:24
Attachments:
FWIW, I'm getting the same test failures on Windows.
Original comment by jerry.br...@gmail.com
on 27 Mar 2014 at 1:31
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Ubuntu 14.04
master / commit 408bca3 (the most recent)
mvn test
...
02:15:17 1,106 PeerGroup.connectToAnyPeer: Waiting 1000 msec before next
connect attempt to [127.0.0.1]:2001
02:15:17 1,113 BlockingClient$1.run: Error trying to open/read from connection:
localhost/127.0.0.1:2001
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at com.google.bitcoin.net.BlockingClient$1.run(BlockingClient.java:77)
...
Tests in error:
testSimpleChannel(com.google.bitcoin.protocols.channels.ChannelConnectionTest): Address already in use
...
Tests in error:
testSimpleChannel(com.google.bitcoin.protocols.channels.ChannelConnectionTest): Address already in use
Tests run: 390, Failures: 0, Errors: 1, Skipped: 1
Original comment by qert...@gmail.com
on 11 May 2014 at 12:18
I wonder if there's a race condition or some kind of kernel timing issue, such
that sockets don't close quickly enough once they're requested to do so (or our
tests are not properly waiting for the nio worker thread to shut down).
I see it was reported also on Windows. Without being able to reproduce it, it's
hard to fix this, but I'll read through the code again at some point and see if
I can spot the problem through visual inspection.
Original comment by mh.in.en...@gmail.com
on 11 May 2014 at 1:17
Issue 569 has been merged into this issue.
Original comment by mh.in.en...@gmail.com
on 13 Jul 2014 at 2:14
In issue 569, i was having a similar issue with peerTest failing. I am running
on ubuntu with openjdk7 and maven. I suppose that test is necessary to succeed
in order to have it functional?
Also, what other platforms/tools can we try it on to get it to build?
Original comment by omar...@gmail.com
on 14 Jul 2014 at 2:52
No, it's probably just a race in the tests that I don't see for some reason.
Just use "mvn install -DskipTests" to get past it. The library should still
work.
Original comment by mh.in.en...@gmail.com
on 14 Jul 2014 at 3:25
I ran the ForwardService example to transfer between two wallets in my setup.
From the console output I can see the block chains were downloaded successfully
and a bitcoin address was generated, but the peer-connections died afterwards.
This seemed to be an artifact of the issue we are having with the tests.
Can you describe exactly what the NIO client manager does and also what
platform are you using to run this build?
In the meantime, maybe I can move away from a debian platform to an environment
where this can run more stable.
Original comment by omar...@gmail.com
on 14 Jul 2014 at 4:49
Could you provide logs?/
NioClientManager manages sockets in a non-blocking way using e.g. epoll or
select, depending on what the platform has.
I use MacOS X for development.
Original comment by mh.in.en...@gmail.com
on 14 Jul 2014 at 4:54
Thanks, Here you go on the logs ...
Original comment by omar...@gmail.com
on 14 Jul 2014 at 6:57
Attachments:
[deleted comment]
I built in and ran the example in Mac OSx .. worked just fine.
Original comment by omar...@gmail.com
on 14 Jul 2014 at 10:25
Original issue reported on code.google.com by
radioano...@gmail.com
on 30 Oct 2013 at 7:04Attachments: