Open StrongestNumber9 opened 5 months ago
Reproduce should be trivially done with
for i in {1..30}; do tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y & export PID="$!"; sleep 0.5; kill "${PID}"; done;
$ for i in {1..30}; do tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y & export PID="$!"; sleep 0.5; kill "${PID}"; done;
[1] 97924
00010 open connections
starting run 1
00002500[2] 97956
00010 open connections
starting run 1
00001100[1]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[3] 97968
00010 open connections
starting run 1
00001700[2]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[4] 97980
00010 open connections
starting run 1
00001700[3]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[5] 97993
00010 open connections
starting run 1
00001800[4]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[6] 98006
00010 open connections
starting run 1
00001600[5]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[7] 98018
00000[6]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[8] 98020
00000[7]- Terminated tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
[9] 98022
Tracelog when in failed state
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] readyKeys <1>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] selectionKeys <[channel=sun.nio.ch.ServerSocketChannelImpl[/[0:0:0:0:0:0:0:0]:4600], [selector=sun.nio.ch.EPollSelectorImpl@4316db06](mailto:selector=sun.nio.ch.EPollSelectorImpl@4316db06), interestOps=16, readyOps=16]>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] selectionKey <channel=sun.nio.ch.ServerSocketChannelImpl[/[0:0:0:0:0:0:0:0]:4600], [selector=sun.nio.ch.EPollSelectorImpl@4316db06](mailto:selector=sun.nio.ch.EPollSelectorImpl@4316db06), interestOps=16, readyOps=16>: isValid <true>, isConnectable <false>, isAcceptable <true>, isReadable <false>, isWritable <false>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.ListenContext] [Thread-0] ServerSocket </[0:0:0:0:0:0:0:0]:4600> accepting ClientSocket </127.0.0.1:46608>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.InterestOpsImpl] [Thread-0] Adding op <1> to currentOps <0>, newOps <1>, keyOps <0>, validOps <13>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.InterestOpsImpl] [Thread-0] Added op <1>, currentOps <1>, keyOps <0>, validOps <13>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] readyKeys <1>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] selectionKeys <[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:4600 remote=/127.0.0.1:46608], [selector=sun.nio.ch.EPollSelectorImpl@4316db06](mailto:selector=sun.nio.ch.EPollSelectorImpl@4316db06), interestOps=1, readyOps=1]>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] selectionKey <channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:4600 remote=/127.0.0.1:46608], [selector=sun.nio.ch.EPollSelectorImpl@4316db06](mailto:selector=sun.nio.ch.EPollSelectorImpl@4316db06), interestOps=1, readyOps=1>: isValid <true>, isConnectable <false>, isAcceptable <false>, isReadable <true>, isWritable <false>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.EstablishedContextImpl] [Thread-0] handleEvent taking read
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.InterestOpsImpl] [Thread-0] Removing op <1> from currentOps <1>, newOps <0>, keyOps <1>, validOps <13>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.InterestOpsImpl] [Thread-0] Removed op <1>, currentOps <0>, keyOps <0>, validOps <13>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.EstablishedContextImpl] [Thread-0] handleEvent submitting new runnable for read
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.channel.context.EstablishedContextImpl] [Thread-0] handleEvent exiting read
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] readyKeys <0>
01.07.2024 17:22:45.047 [DEBUG] [com.teragrep.net_01.eventloop.EventLoop] [Thread-0] selectionKeys <[]>
Maybe related https://github.com/teragrep/net_01/issues/10
Extra logs
02.07.2024 13:28:06.386 [WARN] [com.teragrep.net_01.channel.context.IngressImpl] [pool-2-thread-1] End of stream for PeerAddress </127.0.0.1> PeerPort <35154>. Closing Connection.
02.07.2024 13:28:11.379 [WARN] [com.teragrep.net_01.channel.context.EstablishedContextImpl] [pool-2-thread-1] FrameDelegate close threw exception <java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException>
02.07.2024 13:28:11.380 [WARN] [com.teragrep.net_01.channel.context.ConnectContext] [Thread-0] socketChannel <java.nio.channels.SocketChannel[closed]> finishConnect threw, closing java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)
at com.teragrep.net_01.channel.context.ConnectContext.handleEvent(ConnectContext.java:103)
at com.teragrep.net_01.eventloop.EventLoop.poll(EventLoop.java:144)
at com.teragrep.net_01.eventloop.EventLoop.run(EventLoop.java:198)
at java.base/java.lang.Thread.run(Thread.java:1583)
02.07.2024 13:28:11.380 [WARN] [com.teragrep.net_01.channel.context.ConnectContext] [Thread-0] socketChannel <java.nio.channels.SocketChannel[closed]> finishConnect threw, closing java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)
at com.teragrep.net_01.channel.context.ConnectContext.handleEvent(ConnectContext.java:103)
at com.teragrep.net_01.eventloop.EventLoop.poll(EventLoop.java:144)
at com.teragrep.net_01.eventloop.EventLoop.run(EventLoop.java:198)
at java.base/java.lang.Thread.run(Thread.java:1583)
tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 123 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
seems to cause that finishConnect throwing but not if I use -m 10000
Fairly sure this is fixed with
diff --git a/src/main/java/com/teragrep/cfe_35/router/ParallelTargetRouting.java b/src/main/java/com/teragrep/cfe_35/router/ParallelTargetRouting.java
index 848211c..71606ed 100644
--- a/src/main/java/com/teragrep/cfe_35/router/ParallelTargetRouting.java
+++ b/src/main/java/com/teragrep/cfe_35/router/ParallelTargetRouting.java
@@ -179,14 +179,15 @@ public class ParallelTargetRouting implements TargetRouting {
@Override
public void close() {
for (CompletableFuture<RelpClient> futureClient : outputMap.values()) {
- try {
- RelpClient client = futureClient.get();
+ try (RelpClient client = futureClient.get()) {
+ if (client.isStub()) {
+ continue;
+ }
RelpFrame closeFrame = relpFrameFactory.create("close", "");
client.transmit(closeFrame);
- client.close();
}
catch (Exception e) {
- throw new RuntimeException(e);
+ LOGGER.warn("Failed to close: {}", e.getMessage(), e);
}
}
}
Maybe related teragrep/net_01#10
this is expected when a connection is interrupted uncleanly, only beautification available for the exception and nothing else.
create a pull request of the patch
Seems like that patch does fix this issue Managed to cause a lockdown on connections while closing them through tcpflood
[to-9]$ git remote -v
origin https://github.com/kortemik/cfe_35.git (fetch)
origin https://github.com/kortemik/cfe_35.git (push)
[to-9]$ git pull
Already up to date.
Results for tcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10000 -Y;
and then ^C
ing them:
"pool-2-thread-7" #46 [1856731] prio=5 os_prio=0 cpu=21540.78ms elapsed=983.26s tid=0x00007f715c1f1fb0 nid=1856731 waiting on condition [0x00007f721ebf5000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.4/Native Method)
- parking to wait for <0x00000006a96af4e8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@21.0.4/LockSupport.java:221)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21.0.4/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.4/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.4/ForkJoinPool.java:3725)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@21.0.4/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@21.0.4/CompletableFuture.java:2072)
at com.teragrep.cfe_35.router.ParallelTargetRouting.close(ParallelTargetRouting.java:182)
at com.teragrep.cfe_35.router.MessageParser.close(MessageParser.java:202)
at com.teragrep.rlp_03.frame.delegate.EventDelegate.close(EventDelegate.java:83)
at com.teragrep.rlp_03.frame.delegate.SequencingDelegate.close(SequencingDelegate.java:94)
at com.teragrep.rlp_03.frame.delegate.DefaultFrameDelegate.close(DefaultFrameDelegate.java:81)
at com.teragrep.rlp_03.frame.FrameDelegationClock.close(FrameDelegationClock.java:122)
at com.teragrep.net_01.channel.context.IngressImpl.close(IngressImpl.java:299)
at com.teragrep.net_01.channel.context.EstablishedContextImpl.close(EstablishedContextImpl.java:101)
at com.teragrep.net_01.channel.context.IngressImpl.run(IngressImpl.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@21.0.4/Executors.java:572)
at java.util.concurrent.FutureTask.run(java.base@21.0.4/FutureTask.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.4/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.4/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21.0.4/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.4/Thread.java:1583)
Server command and logs
[to-9]$ java -DserverThreads=8 -Dcfe07LookupPath=src/test/resources/cfe_07 -Dcfe16LookupPath=src/test/resources/cfe_16 -Dkin02LookupPath=src/test/resources/kin_02 -DroutingTargetsConfig=src/test/resources/targetsPerformanceTest.json -DlistenPort=4600 -jar target/cfe_35.jar
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
12:25:52.993 [pool-2-thread-2] ERROR com.teragrep.net_01.channel.context.IngressImpl - run() threw
java.net.SocketException: Connection reset
at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:401) ~[?:?]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:478) ~[?:?]
at java.nio.channels.SocketChannel.read(SocketChannel.java:630) ~[?:?]
at com.teragrep.net_01.channel.socket.PlainSocket.read(PlainSocket.java:65) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at com.teragrep.net_01.channel.context.IngressImpl.readData(IngressImpl.java:237) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at com.teragrep.net_01.channel.context.IngressImpl.run(IngressImpl.java:103) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
12:25:56.973 [pool-2-thread-1] ERROR com.teragrep.net_01.channel.context.IngressImpl - run() threw
java.net.SocketException: Connection reset
at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:401) ~[?:?]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:478) ~[?:?]
at java.nio.channels.SocketChannel.read(SocketChannel.java:630) ~[?:?]
at com.teragrep.net_01.channel.socket.PlainSocket.read(PlainSocket.java:65) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at com.teragrep.net_01.channel.context.IngressImpl.readData(IngressImpl.java:237) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at com.teragrep.net_01.channel.context.IngressImpl.run(IngressImpl.java:103) ~[cfe_35.jar:0.0.1-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Describe the bug
cfe_35 output
Expected behavior
Can connect normally
How to reproduce
Launch server:
java -DserverThreads=8 -Dcfe07LookupPath=src/test/resources/cfe_07 -Dcfe16LookupPath=src/test/resources/cfe_16 -Dkin02LookupPath=src/test/resources/kin_02 -DroutingTargetsConfig=src/test/resources/targetsPerformanceTest.json -DlistenPort=4600 -jar target/cfe_35.jar
Spam tcpflood and cancel it with
^C
continuouslytcpflood -t 127.0.0.1 -T relp-plain -p 4600 -m 1000000 -M "<14>1 2020-05-15T13:24:03.603Z performance-test-host performance-test-tag - - - hello" -c 10 -Y
Screenshots
Software version
commit 2eba9c65400b1496ca23d4a295378b2c59891d00 (HEAD -> to-9, origin/to-9)
https://github.com/kortemik/cfe_35/tree/to-9Desktop (please complete the following information if relevant):
Additional context
This COULD be rlp_03 issue but same version of server is implemented in rlp_08 and that does not suffer from this same issue.