renaissance-benchmarks / renaissance

The Renaissance Benchmark Suite
https://renaissance.dev
GNU General Public License v3.0
313 stars 60 forks source link

finagle-chirper fails on x86-64 Linux #231

Open piyush286 opened 4 years ago

piyush286 commented 4 years ago

Problem Description

Getting the following errors while running finagle-chirper on x86 Linux with Openjdk11-OpenJ9. Earlier, I could run this benchmark on this platform successfully as mentioned in https://github.com/renaissance-benchmarks/renaissance/issues/211.

Errors

12:01:16  Resetting master, feed map size: 5000
12:01:21  ====== finagle-chirper (twitter-finagle), iteration 14 completed (9824.963 ms) ======
12:01:21  ====== finagle-chirper (twitter-finagle), iteration 15 started ======
12:01:21  Resetting master, feed map size: 5000
12:01:26  Exception in thread "Thread-1140" Exception in thread "Thread-1084" Exception in thread "Thread-1087" Exception in thread "Thread-1147" Exception in thread "Thread-1148" Exception in thread "Thread-1119" Exception in thread "Thread-1130" Exception in thread "Thread-1136" Exception in thread "Thread-1102" Exception in thread "Thread-1122" Exception in thread "Thread-1155" Exception in thread "Thread-1123" Exception in thread "Thread-1134" Exception in thread "Thread-1129" Exception in thread "Thread-1108" Exception in thread "Thread-1121" Exception in thread "Thread-1124" Exception in thread "Thread-1106" Exception in thread "Thread-1101" Exception in thread "Thread-1135" Exception in thread "Thread-1138" Exception in thread "Thread-1086" Exception in thread "Thread-1104" Exception in thread "Thread-1146" Exception in thread "Thread-1114" Exception in thread "Thread-1095" Exception in thread "Thread-1141" Exception in thread "Thread-1153" Exception in thread "Thread-1098" Exception in thread "Thread-1150" Exception in thread "Thread-1131" Exception in thread "Thread-1117" Exception in thread "Thread-1091" Exception in thread "Thread-1105" Exception in thread "Thread-1093" Exception in thread "Thread-1139" Failure(connection timed out: localhost/127.0.0.1:37255 at remote address: localhost/127.0.0.1:37255. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:37255, Downstream label: :37255, Trace Id: 7ae0eb8657c17c60.7ae0eb8657c17c60<:7ae0eb8657c17c60 with Service -> :37255Failure(connection timed out: localhost/127.0.0.1:46501 at remote address: localhost/127.0.0.1:46501. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:46501, Downstream label: :46501, Trace Id: 5f55821328870c8f.5f55821328870c8f<:5f55821328870c8f with Service -> :46501Failure(connection timed out: localhost/127.0.0.1:38355 at remote address: localhost/127.0.0.1:38355. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:38355, Downstream label: :38355, Trace Id: 4bfaf6661fa74a2f.4bfaf6661fa74a2f<:4bfaf6661fa74a2f with Service -> :38355Failure(connection timed out: localhost/127.0.0.1:41496 at remote address: localhost/127.0.0.1:41496. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:41496, Downstream label: :41496, Trace Id: 46eb25346382bfc4.46eb25346382bfc4<:46eb25346382bfc4 with Service -> :41496Failure(connection timed out: localhost/127.0.0.1:37794 at remote address: localhost/127.0.0.1:37794. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:37794, Downstream label: :37794, Trace Id: 076c59f94b627628.076c59f94b627628<:076c59f94b627628 with Service -> :37794Failure(connection timed out: localhost/127.0.0.1:43615 at remote address: localhost/127.0.0.1:43615. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:43615, Downstream label: :43615, Trace Id: 49d171f379962e7c.49d171f379962e7c<:49d171f379962e7c with Service -> :43615Failure(connection timed out: localhost/127.0.0.1:42070 at remote address: localhost/127.0.0.1:42070. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:42070, Downstream label: :42070, Trace Id: 1c70dfc606542b87.1c70dfc606542b87<:1c70dfc606542b87 with Service -> :42070
12:01:26  
12:01:26  
12:01:26  Caused by: Caused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:41496 at remote address: localhost/127.0.0.1:41496. Remote Info: Not AvailableCaused by: com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:43615 at remote address: localhost/127.0.0.1:43615. Remote Info: Not Available
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:37794 at remote address: localhost/127.0.0.1:37794. Remote Info: Not AvailableCaused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:42070 at remote address: localhost/127.0.0.1:42070. Remote Info: Not Available
12:01:26    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26  
12:01:26    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)   at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)   at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26  
12:01:26    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)   at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
12:01:26  Exception in thread "Thread-1152" 
12:01:26  
12:01:26    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)Failure(connection timed out: localhost/127.0.0.1:44631 at remote address: localhost/127.0.0.1:44631. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:44631, Downstream label: :44631, Trace Id: 1440fc2c6d90890a.1440fc2c6d90890a<:1440fc2c6d90890a with Service -> :44631
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)   at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)Caused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:44631 at remote address: localhost/127.0.0.1:44631. Remote Info: Not Available
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)  at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)  at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)  at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
12:01:26    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
12:01:26    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)   at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26  
12:01:26    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)   at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)Exception in thread "Thread-1137" 
12:01:26  
12:01:26    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)Failure(connection timed out: localhost/127.0.0.1:42545 at remote address: localhost/127.0.0.1:42545. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:42545, Downstream label: :42545, Trace Id: fad5076b6173754c.fad5076b6173754c<:fad5076b6173754c with Service -> :42545  at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
12:01:26    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)   at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
12:01:26  
12:01:26    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)Caused by: 
12:01:26    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)   at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)   at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:42545 at remote address: localhost/127.0.0.1:42545. Remote Info: Not Available    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26  
12:01:26    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)Exception in thread "Thread-1144"   at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
12:01:26    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
12:01:26  
12:01:26    at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)   at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)Failure(connection timed out: localhost/127.0.0.1:34803 at remote address: localhost/127.0.0.1:34803. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:34803, Downstream label: :34803, Trace Id: 758e432f8c8caca7.758e432f8c8caca7<:758e432f8c8caca7 with Service -> :34803
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)   at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
12:01:26  Caused by: 
12:01:26  
12:01:26    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:34803 at remote address: localhost/127.0.0.1:34803. Remote Info: Not Available at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
12:01:26  
12:01:26    at java.base/java.lang.Thread.run(Thread.java:834)  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
12:01:26  
12:01:26  
12:01:26  
12:01:26    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)  at 

To Reproduce

ceresek commented 4 years ago

I'm running finagle-chirper on x86_64 (Fedora Linux) and OpenJDK 11 with no obvious issues. Could you please provide a bit more info to help us reproduce the error ? (Did you use Renaissance HEAD or the 0.10.0 release ? Is the machine you are using special in any way - e.g. high number of cores, lots of RAM, etc. ?) Thanks.

piyush286 commented 4 years ago

I used 0.9.0 release from here: https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.9.0/renaissance-mit-0.9.0.jar

OpenJ9 JDK: https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.7%2B10_openj9-0.20.0/OpenJDK11U-jdk_x64_linux_openj9_11.0.7_10_openj9-0.20.0.tar.gz

Here's the info about the machine:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1869.451
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              4599.93
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
ceresek commented 4 years ago

I've just finished running Renaissance 393adff09982986550ea888470119b329315a8f6 with OpenJ9 11.0.8 x86_64 build from May 5, on a machine with 80 processors, with and without forced GC between iterations, default number of iterations (90), with no error, so this looks a bit more difficult to reproduce.

Can you please try with Renaissance built from current HEAD ?

Also, is it possible that the machine where you see the problem is also loaded by other workloads ?

farquet commented 3 years ago

I can reproduce this problem on a specific machine. I'll have a look but I feel the benchmark can't open some ports or reach out to localhost somehow.