Closed arvindkandhare closed 7 years ago
The line
OwnershipAcquireFailedException: LockPath - /messaging/distributedlog/mynamespace/container_1//lock: Lock acquisition failed, the current owner is interactive-console
Makes me think that this is not a retryable thing. There is another owner. Am I missing something?
The scenario is where ownership change is triggered by the controller. The new owner is supposed to take over the container and hits this exception. If we do not retry, this container will basically be unserviced indefinitely.
Here are the logs and more details.
As the target host tries only once to acquire the lock because of the issue shown in the logs below, the acquirelock is tried only once. If both the instances are restarted again, the right instance acquires the lock sometimes. The restart can be avoided if the target host keeps retrying locks for the containers it knows it owns.
java.util.concurrent.TimeoutException: Timeout on waiting for ReadAhead worker to stop container_1:
This issue is no more observed with the latest code. Will reopen if this is observed again.
There is a chance the lock may not be available for the first time. Tier1 should backoff and try again.
21:29:01.680 [pool-1-thread-4] ERROR com.emc.pravega.service.server.logs.DurableLog - DurableLog[1] Recovery FAILED. {} com.emc.pravega.service.storage.DataLogWriterNotPrimaryException: Unable to acquire exclusive Writer for log 'container_1'. at com.emc.pravega.service.storage.impl.distributedlog.LogHandle.initialize(LogHandle.java:173) at com.emc.pravega.service.storage.impl.distributedlog.LogClient.getLogHandle(LogClient.java:177) at com.emc.pravega.service.storage.impl.distributedlog.DistributedLogDataLog.initialize(DistributedLogDataLog.java:80) at com.emc.pravega.service.server.logs.DurableLog.performRecovery(DurableLog.java:305) at com.emc.pravega.service.server.logs.DurableLog.lambda$doStart$0(DurableLog.java:144) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.twitter.distributedlog.exceptions.OwnershipAcquireFailedException: LockPath - /messaging/distributedlog/mynamespace/container_1//lock: Lock acquisition failed, the current owner is interactive-console
at com.twitter.distributedlog.lock.ZKSessionLock$8.apply(ZKSessionLock.java:587)
at com.twitter.distributedlog.lock.ZKSessionLock$8.apply(ZKSessionLock.java:584)
at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$8.apply(Future.scala:1042)
at com.twitter.util.Try$.apply(Try.scala:13)
at com.twitter.util.Future$.apply(Future.scala:132)
at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1042)
at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1041)
at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:112)
at com.twitter.util.Promise$Transformer.k(Promise.scala:112)
at com.twitter.util.Promise$Transformer.apply(Promise.scala:122)
at com.twitter.util.Promise$Transformer.apply(Promise.scala:103)
at com.twitter.util.Promise$$anon$1.run(Promise.scala:357)
at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:178)
at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:136)
at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:207)
at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:92)
at com.twitter.util.Promise.runq(Promise.scala:350)
at com.twitter.util.Promise.updateIfEmpty(Promise.scala:716)
at com.twitter.util.Promise.update(Promise.scala:694)
at com.twitter.util.Promise.setValue(Promise.scala:670)
at com.twitter.distributedlog.lock.ZKSessionLock$9.safeRun(ZKSessionLock.java:639)
at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
... 7 common frames omitted