There is a pipeline which has been consistently getting stuck on attempt to write to JDBC. The thread dump on one worker revealed a bunch of threads waiting for a new connection to be allocated:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <45ac842e> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:581)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:437)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:734)
at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn.executeBatch(JdbcIO.java:1449)
at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn.processElement(JdbcIO.java:1398)
at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
There were 11 such threads waiting for a new connection from a pool. Other workers were idle. It looks like this single worker was holding the watermark back and the pipeline stopped making any progress and appeared as stuck. The default maximum number of connections is 8 according to this and beam neither overrides nor exposes it as a separate config for bumping it. In its turn scio doesn't support this as well. There is a break-glass approach how to configure it and was referenced in the BEAM-9629.
This work should be also done together with an investigation into why DB connections aren't reused. Does a failed batch leaks a DB connection and it is never returned to the pool?
There is a pipeline which has been consistently getting stuck on attempt to write to JDBC. The thread dump on one worker revealed a bunch of threads waiting for a new connection to be allocated:
There were 11 such threads waiting for a new connection from a pool. Other workers were idle. It looks like this single worker was holding the watermark back and the pipeline stopped making any progress and appeared as stuck. The default maximum number of connections is 8 according to this and beam neither overrides nor exposes it as a separate config for bumping it. In its turn scio doesn't support this as well. There is a break-glass approach how to configure it and was referenced in the BEAM-9629.
This work should be also done together with an investigation into why DB connections aren't reused. Does a failed batch leaks a DB connection and it is never returned to the pool?