hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

neo4j.het.io down as of 2018-06-13 #9

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

I noticed https://neo4j.het.io/browser/ is currently not working with ERR_CONNECTION_REFUSED. I ran docker logs hetionet-container to see what happened. Below is the tail of the docker logs:

``` 2018-06-12 13:51:13.428+0000 INFO Transaction with id 14886 has been automatically rolled back due to transaction timeout. 2018-06-12 14:12:37.082+0000 WARN Illegal character 0x1B in state=START for buffer HeapByteBuffer@9be262e8[p=1,l=114,c=8192,r=113]={\x1b<<<$J\xB3\x00U\xBa\x05\xD4\xD4\x1f[U\xE8\xCf\xF0\xAb_...\xE7\x8b{\xE1\xE7\x8b{\xE5\xE7\x8b{\xE5\xE7\x04\x00>>>0 (Windows NT 6.1...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2018-06-12 14:12:37.082+0000 WARN badMessage: 400 Illegal character 0x1B for HttpChannelOverHttp@4fd9e509{r=0,c=false,a=IDLE,uri=-} 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14896 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14887 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14888 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14889 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14890 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14891 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.427+0000 INFO Transaction with id 14892 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.428+0000 INFO Transaction with id 14893 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.428+0000 INFO Transaction with id 14894 has been automatically rolled back due to transaction timeout. 2018-06-12 16:46:43.428+0000 INFO Transaction with id 14895 has been automatically rolled back due to transaction timeout. 2018-06-12 18:55:05.557+0000 WARN Illegal character 0x1C in state=START for buffer HeapByteBuffer@ab89f633[p=1,l=114,c=8192,r=113]={\x1c<<>>leWebKit/537.36 (...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2018-06-12 18:55:05.557+0000 WARN badMessage: 400 Illegal character 0x1C for HttpChannelOverHttp@4304652d{r=0,c=false,a=IDLE,uri=-} 2018-06-13 01:32:29.278+0000 WARN Illegal character 0x5 in state=START for buffer HeapByteBuffer@d297ec0b[p=1,l=3,c=8192,r=2]={\x05<<<\x01\x00>>> / HTTP/1.0\r\n\r\ns/...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2018-06-13 01:32:29.279+0000 WARN badMessage: 400 Illegal character 0x5 for HttpChannelOverHttp@2bfbccf0{r=0,c=false,a=IDLE,uri=-} 2018-06-13 01:41:10.661+0000 WARN Illegal character 0x4 in state=START for buffer HeapByteBuffer@feae0e47[p=1,l=9,c=8192,r=8]={\x04<<<\x01\x00P.\xA1\t\x1f\x00>>>TP/1.0\r\nHost: 127...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2018-06-13 01:41:10.661+0000 WARN badMessage: 400 Illegal character 0x4 for HttpChannelOverHttp@a555b23{r=0,c=false,a=IDLE,uri=-} 2018-06-13 07:40:17.648+0000 WARN /browser/app-a5e172cd9d95a990973f.js org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException java.io.IOException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:234) at org.eclipse.jetty.server.HttpOutput.sendContent(HttpOutput.java:572) at org.eclipse.jetty.servlet.DefaultServlet.sendData(DefaultServlet.java:955) at org.eclipse.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:533) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.neo4j.server.web.StaticContentFilter.doFilter(StaticContentFilter.java:52) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.neo4j.server.rest.dbms.AuthorizationDisabledFilter.doFilter(AuthorizationDisabledFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.neo4j.server.rest.web.CollectUserAgentFilter.doFilter(CollectUserAgentFilter.java:69) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:216) ... 33 more 2018-06-13 07:40:17.649+0000 WARN /browser/app-a5e172cd9d95a990973f.js org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException java.io.IOException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:234) at org.eclipse.jetty.server.HttpOutput.sendContent(HttpOutput.java:572) at org.eclipse.jetty.servlet.DefaultServlet.sendData(DefaultServlet.java:955) at org.eclipse.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:533) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.neo4j.server.web.StaticContentFilter.doFilter(StaticContentFilter.java:52) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.neo4j.server.rest.dbms.AuthorizationDisabledFilter.doFilter(AuthorizationDisabledFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.neo4j.server.rest.web.CollectUserAgentFilter.doFilter(CollectUserAgentFilter.java:69) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:216) ... 33 more 2018-06-13 11:06:13.516+0000 WARN Exception thrown when submitting maintenance task unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486) at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517) at java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634) at java.util.concurrent.ForkJoinPool.externalPush(ForkJoinPool.java:2414) at java.util.concurrent.ForkJoinPool.execute(ForkJoinPool.java:2648) at com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(BoundedLocalCache.java:967) at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRead(BoundedLocalCache.java:807) at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1946) at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:113) at com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:54) at org.neo4j.cypher.internal.compiler.v3_2.LFUCache.getOrElseUpdate(LFUCache.scala:30) at org.neo4j.cypher.internal.ExecutionEngine.preParseQuery(ExecutionEngine.scala:126) at org.neo4j.cypher.internal.ExecutionEngine.parseQuery(ExecutionEngine.scala:112) at org.neo4j.cypher.internal.ExecutionEngine.isPeriodicCommit(ExecutionEngine.scala:204) at org.neo4j.cypher.internal.javacompat.ExecutionEngine.isPeriodicCommit(ExecutionEngine.java:86) at org.neo4j.bolt.v1.runtime.TransactionStateMachineSPI.isPeriodicCommit(TransactionStateMachineSPI.java:120) at org.neo4j.bolt.v1.runtime.TransactionStateMachine$State$1.run(TransactionStateMachine.java:188) at org.neo4j.bolt.v1.runtime.TransactionStateMachine.run(TransactionStateMachine.java:74) at org.neo4j.bolt.v1.runtime.BoltStateMachine$State$2.run(BoltStateMachine.java:395) at org.neo4j.bolt.v1.runtime.BoltStateMachine.run(BoltStateMachine.java:194) at org.neo4j.bolt.v1.messaging.BoltMessageRouter.lambda$onRun$3(BoltMessageRouter.java:80) at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.execute(RunnableBoltWorker.java:130) at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:91) at java.lang.Thread.run(Thread.java:748) 2018-06-13 11:06:14.219+0000 WARN Exception thrown when submitting maintenance task unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486) at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517) at java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634) at java.util.concurrent.ForkJoinPool.externalPush(ForkJoinPool.java:2414) at java.util.concurrent.ForkJoinPool.execute(ForkJoinPool.java:2648) at com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(BoundedLocalCache.java:967) at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRead(BoundedLocalCache.java:807) at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1946) at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:113) at com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:54) at org.neo4j.cypher.internal.compiler.v3_2.LFUCache.getOrElseUpdate(LFUCache.scala:30) at org.neo4j.cypher.internal.compiler.v3_2.LFUCache.apply(LFUCache.scala:48) at org.neo4j.cypher.internal.compiler.v3_2.MonitoringCacheAccessor.getOrElseUpdate(CacheAccessor.scala:55) at org.neo4j.cypher.internal.compiler.v3_2.QueryCache$$anonfun$getOrElseUpdate$1.apply(CacheAccessor.scala:34) at scala.collection.Iterator$$anon$9.next(Iterator.scala:162) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$12.next(Iterator.scala:444) at org.neo4j.cypher.internal.compiler.v3_2.QueryCache.getOrElseUpdate(CacheAccessor.scala:46) at org.neo4j.cypher.internal.ExecutionEngine.liftedTree1$1(ExecutionEngine.scala:170) at org.neo4j.cypher.internal.ExecutionEngine.planQuery(ExecutionEngine.scala:156) at org.neo4j.cypher.internal.ExecutionEngine.execute(ExecutionEngine.scala:106) at org.neo4j.cypher.internal.javacompat.ExecutionEngine.executeQuery(ExecutionEngine.java:61) at org.neo4j.bolt.v1.runtime.TransactionStateMachineSPI$1.start(TransactionStateMachineSPI.java:144) at org.neo4j.bolt.v1.runtime.TransactionStateMachine$State$1.run(TransactionStateMachine.java:202) at org.neo4j.bolt.v1.runtime.TransactionStateMachine.run(TransactionStateMachine.java:74) at org.neo4j.bolt.v1.runtime.BoltStateMachine$State$2.run(BoltStateMachine.java:395) at org.neo4j.bolt.v1.runtime.BoltStateMachine.run(BoltStateMachine.java:194) at org.neo4j.bolt.v1.messaging.BoltMessageRouter.lambda$onRun$3(BoltMessageRouter.java:80) at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.execute(RunnableBoltWorker.java:130) at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:91) at java.lang.Thread.run(Thread.java:748) 2018-06-13 11:07:06.068+0000 WARN unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:447) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:598) at java.lang.Thread.run(Thread.java:748) 2018-06-13 11:07:06.170+0000 WARN unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) Exception in thread "qtp1340311417-29834" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:447) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:620) at java.lang.Thread.run(Thread.java:748) at java.lang.Thread.start(Thread.java:717) at org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:447) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:598) at java.lang.Thread.run(Thread.java:748) 2018-06-13 11:07:06.173+0000 WARN Unexpected thread death: {} in {} 2018-06-13 11:07:36.537+0000 WARN unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:447) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:598) at java.lang.Thread.run(Thread.java:748) 2018-06-13 11:07:36.547+0000 WARN Unexpected thread death: {} in {} Exception in thread "qtp1340311417-30890" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:447) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:620) at java.lang.Thread.run(Thread.java:748) Killed ```

So it looks like Neo4j ran out of memory. We've had similar issues in the past with Neo4j 3.1: see https://github.com/neo4j/neo4j/issues/9209. We have also had an issue where Neo4j runs out of memory but does not die as a process: see https://github.com/dhimmel/hetionet/issues/5. Currently neo4j.het.io is running Neo4j 3.2.3:

https://github.com/dhimmel/hetionet/blob/6d26d15e9055b33b4fd97a180fa288e4f2060b96/hetnet/neo4j/docker/Dockerfile#L1

So the two main questions for me are:

  1. why did our instance run out of memory?
  2. why didn't docker restart since we have a restart policy? https://github.com/dhimmel/hetionet/blob/6d26d15e9055b33b4fd97a180fa288e4f2060b96/hetnet/neo4j/docker/host/run-docker.sh#L6
dhimmel commented 6 years ago

Digital Ocean Graphs

Here are the 30 day graphs for our Digital Ocean droplet usage:

2018-06-18-digitalocean-graphs

You can see the crash was preceded by heavy disk IO but not a huge amount of CPU usage.

dhimmel commented 6 years ago

I rebooted the Droplet, and the docker restarted, so https://neo4j.het.io is now working. I could consider upgrading our instance to 3.2.12 (Dockerfile). See the 3.2 changelog here.

Update: Neo4j 3.2.12 has a permissions issue when mapping the logs volume. see https://github.com/neo4j/docker-neo4j/issues/131#issuecomment-398097390.

dhimmel commented 6 years ago

I'm closing this issue, since the instance is now up. Hopefully it doesn't come back... maybe the upgrade to 3.2.8 in https://github.com/dhimmel/hetionet/commit/9f214ab7cb6b59fde5fd9530e379d2d71cee4f43 will fix it.