magro / memcached-session-manager

A tomcat session manager that backups sessions in memcached and pulls them from there if asked for unknown sessions
Apache License 2.0
758 stars 348 forks source link

Could not store secondary backup of session #289

Open srikanthpulikanti opened 8 years ago

srikanthpulikanti commented 8 years ago

Hi Martin,

Following is the configuration which we used to create tomcat cluster using MSM.

<Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"
                memcachedNodes="n1:host1-ip,n2:host2-ip"
                sticky="false"
                sessionBackupAsync="false"
                requestUriIgnorePattern=".*\.(ico.*|png.*|gif.*|jpg.*|css.*|js|pdf.*|doc.*|docx.*|xls.*|xlsx.*|log.*|csv.*|txt.*|svg.*|bmp.*|jpeg.*|img.*|xml.*)$"
   transcoderFactoryClass="de.javakaffee.web.msm.JavaSerializationTranscoderFactory"/>

As part of testing, we brought down memcached in node2 (n2), then while testing in logs following exception occurred. So can you please let me know is it the correct behavior. Or do we need any extra configuration to avoid this issue.

Mar 16, 2016 8:49:52 AM de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask call
INFO: Could not store secondary backup of session 49EAB2214083781A97CB7096C455A55C-n1
java.lang.IllegalArgumentException: No node found for key bak:49EAB2214083781A97CB7096C455A55C-n1 (nodeId: null, known nodeIds: [n1, n2])
    at de.javakaffee.web.msm.SuffixBasedNodeLocator.getPrimary(SuffixBasedNodeLocator.java:101)
    at net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:1168)
    at net.spy.memcached.MemcachedConnection.enqueueOperation(MemcachedConnection.java:1148)
    at net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:328)
    at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:929)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.saveSessionBackupFromResult(LockingStrategy.java:504)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.call(LockingStrategy.java:487)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.call(LockingStrategy.java:438)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

Mar 16, 2016 8:49:57 AM de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask call
INFO: Could not store secondary backup of session 49EAB2214083781A97CB7096C455A55C-n1
java.lang.IllegalArgumentException: No node found for key bak:49EAB2214083781A97CB7096C455A55C-n1 (nodeId: null, known nodeIds: [n1, n2])
    at de.javakaffee.web.msm.SuffixBasedNodeLocator.getPrimary(SuffixBasedNodeLocator.java:101)
    at net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:1168)
    at net.spy.memcached.MemcachedConnection.enqueueOperation(MemcachedConnection.java:1148)
    at net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:328)
    at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:929)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.saveSessionBackupFromResult(LockingStrategy.java:504)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.call(LockingStrategy.java:487)
    at de.javakaffee.web.msm.LockingStrategy$OnAfterBackupSessionTask.call(LockingStrategy.java:438)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
magro commented 8 years ago

If node 2 is not available that's the expected behaviour.

srikanthpulikanti commented 8 years ago

Hi Martin Thanks for the quick response. Today we faced a different issue with the same configuration.

  1. As per the configuration which I have given we have created a tomcat cluster.
  2. But the memcached in node n2 was down, and cluster was running on node n1 memcached server.
  3. We started memcached in node n1, now both n1 and n2 memcached servers are running, and tomcat cluster is running in non-sticky mode
  4. After some time, like 5-10 minutes after starting the memcached in n2, we stopped memcached in n1.
  5. What we have observed was, still in both tomcats the memcached in n2 was not identified, and still MSM in both tomcats was trying find other node as n1 is down. Now the thing is, both tomcats did not identify n2, and hence the application was not able to login.
  6. Once we restarted both tomcats, then n2 was identified, and it was serving the cluster

Following is the error in the logs

Mar 17, 2016 1:19:28 AM de.javakaffee.web.msm.MemcachedSessionService changeSessionIdOnMemcachedFailover
INFO: Session needs to be relocated as node n1 is not available, loading backup session for 13C4BBA8B7B74CC136BAC5C4A54CE9D3-n1
Mar 17, 2016 1:19:28 AM de.javakaffee.web.msm.MemcachedSessionService loadBackupSession
INFO: No next available node found for nodeId n1
Mar 17, 2016 1:21:13 AM de.javakaffee.web.msm.MemcachedSessionService changeSessionIdOnMemcachedFailover
INFO: Session needs to be relocated as node n1 is not available, loading backup session for 6C7ADBDF064E66468A768999EFE3F0D8-n1
Mar 17, 2016 1:21:13 AM de.javakaffee.web.msm.MemcachedSessionService loadBackupSession
INFO: No next available node found for nodeId n1
Mar 17, 2016 1:22:31 AM de.javakaffee.web.msm.MemcachedSessionService changeSessionIdOnMemcachedFailover
INFO: Session needs to be relocated as node n1 is not available, loading backup session for 47C659C9A75A1A453D478558DAE10AA9-n1
Mar 17, 2016 1:22:31 AM de.javakaffee.web.msm.MemcachedSessionService loadBackupSession
INFO: No next available node found for nodeId n1
Mar 17, 2016 1:22:37 AM de.javakaffee.web.msm.MemcachedSessionService changeSessionIdOnMemcachedFailover
INFO: Session needs to be relocated as node n1 is not available, loading backup session for 142599E24BFDAB583D1FF1A178CAAA12-n1
Mar 17, 2016 1:22:37 AM de.javakaffee.web.msm.MemcachedSessionService loadBackupSession
INFO: No next available node found for nodeId n1
Mar 17, 2016 1:23:49 AM de.javakaffee.web.msm.MemcachedSessionService changeSessionIdOnMemcachedFailover
INFO: Session needs to be relocated as node n1 is not available, loading backup session for 188D36360A5AFBA771AEB4296A91C7EC-n1
Mar 17, 2016 1:23:49 AM de.javakaffee.web.msm.MemcachedSessionService loadBackupSession
INFO: No next available node found for nodeId n1
Mar 17, 2016 1:24:13 AM de.javakaffee.web.msm.BackupSessionTask handleException
WARNING: Could not store session D4DCDCE628D1FC35600FC11BC8B2C2AC-n1 in memcached.
java.util.concurrent.ExecutionException: java.util.concurrent.CancellationException: Cancelled
        at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)
        at de.javakaffee.web.msm.BackupSessionTask.storeSessionInMemcached(BackupSessionTask.java:236)
        at de.javakaffee.web.msm.BackupSessionTask.doBackupSession(BackupSessionTask.java:199)
        at de.javakaffee.web.msm.BackupSessionTask.call(BackupSessionTask.java:120)
        at de.javakaffee.web.msm.BackupSessionTask.call(BackupSessionTask.java:51)
        at de.javakaffee.web.msm.BackupSessionService$SynchronousExecutorService.submit(BackupSessionService.java:346)
        at de.javakaffee.web.msm.BackupSessionService.backupSession(BackupSessionService.java:205)
        at de.javakaffee.web.msm.MemcachedSessionService.backupSession(MemcachedSessionService.java:1084)
        at de.javakaffee.web.msm.RequestTrackingHostValve.backupSession(RequestTrackingHostValve.java:230)
        at de.javakaffee.web.msm.RequestTrackingHostValve.invoke(RequestTrackingHostValve.java:159)
        at de.javakaffee.web.msm.RequestTrackingHostValve.invoke(RequestTrackingHostValve.java:124)

I am trying to drill down and see what can be the issue from our side, Can you also please suggest what might went wrong. There is any thing we need to do?

magro commented 8 years ago
srikanthpulikanti commented 8 years ago

Hi Martin,

Sorry for delayed response. When we tried to do again the same kind of testing, the memcached server which is up memcached in node 2 (n2) (after bringing it down for some time) was identified by MSM. But there was a lag in time to identify the n2 memcached server. This was approximately 7 mins. This I got by observing the logs. I am not sure what happened in previous testing, the MSM was unable to identify the the memcached instance, but the same worked properly in the next testing.

Point 2 : In our environment we have 2 hosts/nodes (host1,host2), each having a tomcat and memcached installed (t1,t2 and m1,m2). These 2 tomcats have been configured with MSM configuration as mentioned in my first comment. So we have a load balancer which uses non-sticky mode to direct requests to both nodes tomcat in round-robin approach. This is what I have referred as Cluster.

Point 3 : Yes it does make sense. Because here we are trying to test different test scenarios where in scenario1) t1,t2 and m1 will be active, scenario2) t1,t2 and m2 will be active, scenario3) t1,m1 and scenario4) t2,m2 will be active (Here active means up and running where as other will be down.) In all the scenarios configuration of both load balancer and MSM will be non-sticky.

magro commented 8 years ago

Ah, I interpreted the numbered items as a list of steps to follow. If these are different scenarios then ok.

So basically the issue is spymemcached reconnection, right? In the logs you shared I didn't see spymemcached errors/reconnects - you should enable debug logs for spymemcached to see more details about this. At least this seems to be the way for further investigation.