Open sean- opened 9 years ago
What it exactly means is that if the lock is lost, then the child will be terminated. The code is taking the following steps:
KV.Acquire
function to apply that sessionID to a key. By a lock session is just a session with a TTL that is then renewed with Session.RenewPeriodic
which does exactly what the name implies. A semaphore has a couple more steps but they're mostly bookkeeping and don't really change how the lock can be lost.The lock channel can be closed by the following 3 cases, all of which apply to a Semaphore as well:
Session.Destroy
) the session that is used to hold the key.So, to answer each of your questions one by one:
If there is a partition between the agent holding the lock (AgentA) and the server leader, but the AgentA is still on the network and able to be contacted by Serf from other Agents in the data center, what happens? Said another way, if AgentA can't talk to the Server, but AgentB can reach both AgentA and the Server, how does the system handle this degraded state?
The system doesn't handle this, it must be able to talk to a server to update the state (TTL), but it it shouldn't need to talk to the leader explicitly, just a server.
Thank you for the above clarification. If the AgentA is able to reach ServerA and ServerB, but ServerC is the leader and partitioned off only from AgentA (i.e. ServerC is still on the network, but there is some transient connectivity issue), will AgentA attempt to refresh its lock by connecting with either ServerA or ServerB automatically?
My understanding is that if all Servers are online and able to communicate, a comm failure between AgentA and ServerC won't start a new RAFT election or a new term. What is less clear to me is that newClient()
is called again to reset the client.config.Address
. I don't see any wrapper around the client that will assign a new Address in the event of a partition.
https://github.com/hashicorp/consul/blob/master/command/lock.go#L109
? Does that imply that a partition between AgentA and ServerC will result in a loss of the lock if the partition is greater than 15s? I'm not suggesting or requesting that the client thundering herd attempt to talk to all servers, so much as looking for clarification in terms of what failure modes exist.
AgentA
(the usual scenario), then client.config.Address
is 127.0.0.1:8500
which means that the local agent will handle reconnection using its internal logic - consul lock is contacting the agent over the HTTP API which is then being translated into consul RPC and sent off to the servers.I'm going to try this out and report back.
Okay, so this is very interesting and behaves super differently in a variety of scenarios:
consul lock test "sleep 60"
Now, this is where it varies (TL;DR scroll to the end to watch the lock be violated):
Scenario 1: Cleanly exiting the leader with leave_on_terminate at its default value of true
, which means it shuts itself down and sends a message to remove itself from Raft, which isn't the same as a partition, but can easily happen.
I got the following result:
vagrant@agentA:~$ ./consul lock test "sleep 60"
Error running handler: signal: terminated
signal: terminated
Lock release failed: failed to release lock: Unexpected response code: 500 (rpc error: No cluster leader)
I immediately tried running the same command and got the following error until the election finished:
vagrant@agentA:~$ ./consul lock test "sleep 60"
Lock acquisition failed: failed to create session: Unexpected response code: 500 (rpc error: connection is shut down)
vagrant@agentA:~$
I can see not being able to create a lock during an election, so I'm actually fine with this except the error is a bit unclear.
Scenario 2:
I ran pkill -9 consul
on the leader box after starting a new round of "sleep 60".
Results:
Serf immediately detected the failure in the server, but there were 0 errors from the agent and the 60 seconds finished cleanly. However, because of what you'll see in scenario 3, the agent was clearly not connected with the leader, but some other server. The output was as follows:
2015/06/25 03:02:36 [DEBUG] http: Request /v1/session/create (6.70589ms)
2015/06/25 03:02:36 [DEBUG] http: Request /v1/kv/test/.lock?wait=15000ms (1.578075ms)
2015/06/25 03:02:36 [DEBUG] http: Request /v1/kv/test/.lock?acquire=9ea91012-04a2-973e-2199-61530220adbd&flags=3304740253564472344 (3.492827ms)
2015/06/25 03:02:36 [DEBUG] http: Request /v1/kv/test/.lock?consistent= (2.779486ms)
2015/06/25 03:02:43 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (1.77661ms)
2015/06/25 03:02:45 [DEBUG] memberlist: TCP connection from: 192.168.33.12:54850
2015/06/25 03:02:51 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (1.587599ms)
2015/06/25 03:02:53 [INFO] memberlist: Suspect serverA has failed, no acks received
2015/06/25 03:02:56 [INFO] memberlist: Suspect serverA has failed, no acks received
2015/06/25 03:02:58 [INFO] memberlist: Suspect serverA has failed, no acks received
2015/06/25 03:02:58 [INFO] memberlist: Marking serverA as failed, suspect timeout reached
2015/06/25 03:02:58 [INFO] serf: EventMemberFailed: serverA 192.168.33.11
2015/06/25 03:02:58 [INFO] consul: removing server serverA (Addr: 192.168.33.11:8300) (DC: dc1)
2015/06/25 03:02:58 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (1.727909ms)
2015/06/25 03:02:59 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.33.12:8301
2015/06/25 03:03:01 [DEBUG] serf: forgoing reconnect for random throttling
2015/06/25 03:03:06 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (1.68642ms)
2015/06/25 03:03:13 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (1.436247ms)
2015/06/25 03:03:15 [DEBUG] memberlist: TCP connection from: 192.168.33.12:54876
2015/06/25 03:03:21 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (2.342892ms)
2015/06/25 03:03:28 [DEBUG] http: Request /v1/session/renew/9ea91012-04a2-973e-2199-61530220adbd (2.402064ms)
2015/06/25 03:03:29 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.33.13:8301
2015/06/25 03:03:31 [DEBUG] serf: forgoing reconnect for random throttling
2015/06/25 03:03:36 [DEBUG] http: Request /v1/kv/test/.lock?flags=3304740253564472344&release=9ea91012-04a2-973e-2199-61530220adbd (4.957429ms)
2015/06/25 03:03:36 [DEBUG] http: Request /v1/kv/test/.lock?consistent=&index=104 (1m0.005539804s)
2015/06/25 03:03:36 [DEBUG] http: Request /v1/kv/test/.lock (2.310693ms)
2015/06/25 03:03:36 [DEBUG] http: Request /v1/session/destroy/9ea91012-04a2-973e-2199-61530220adbd (6.032076ms)
2015/06/25 03:03:36 [DEBUG] http: Request /v1/kv/test/.lock?cas=108 (4.490823ms)
Scenario 3 (The one you're interested in):
I started a new consul lock sleep test and started running sudo iptables -I INPUT -s 192.168.33.10 -j DROP
on boxes until I figured out which one the agent was talking to.
This is some strange behavior that I think @armon or someone else that wrote this needs to comment on.
Here's what happened, with consul communicating with serverB
:
2015/06/25 03:15:05 [DEBUG] http: Request /v1/session/create (5.305826ms)
2015/06/25 03:15:05 [DEBUG] http: Request /v1/kv/test/.lock?wait=15000ms (1.212142ms)
2015/06/25 03:15:05 [DEBUG] http: Request /v1/kv/test/.lock?acquire=46e20bb7-f59d-8e4a-8484-a12d9b76d843&flags=3304740253564472344 (3.735569ms)
2015/06/25 03:15:05 [DEBUG] http: Request /v1/kv/test/.lock?consistent= (1.380977ms)
2015/06/25 03:15:09 [ERR] memberlist: Push/Pull with serverC failed: dial tcp 192.168.33.13:8301: i/o timeout
2015/06/25 03:15:13 [DEBUG] http: Request /v1/session/renew/46e20bb7-f59d-8e4a-8484-a12d9b76d843 (1.408075ms)
2015/06/25 03:15:18 [INFO] memberlist: Suspect serverB has failed, no acks received
2015/06/25 03:15:20 [DEBUG] http: Request /v1/session/renew/46e20bb7-f59d-8e4a-8484-a12d9b76d843 (966.864µs)
2015/06/25 03:15:23 [INFO] memberlist: Marking serverB as failed, suspect timeout reached
2015/06/25 03:15:23 [INFO] serf: EventMemberFailed: serverB 192.168.33.12
2015/06/25 03:15:23 [INFO] consul: removing server serverB (Addr: 192.168.33.12:8300) (DC: dc1)
2015/06/25 03:15:35 [INFO] memberlist: Suspect serverA has failed, no acks received
2015/06/25 03:15:38 [WARN] memberlist: Refuting a suspect message (from: serverA)
2015/06/25 03:15:39 [DEBUG] memberlist: Initiating push/pull sync with: 192.168.33.13:8301
2015/06/25 03:15:41 [INFO] serf: EventMemberJoin: serverB 192.168.33.12
2015/06/25 03:15:41 [INFO] consul: adding server serverB (Addr: 192.168.33.12:8300) (DC: dc1)
2015/06/25 03:15:55 [INFO] memberlist: Suspect serverB has failed, no acks received
2015/06/25 03:16:00 [INFO] memberlist: Marking serverB as failed, suspect timeout reached
2015/06/25 03:16:00 [INFO] serf: EventMemberFailed: serverB 192.168.33.12
2015/06/25 03:16:00 [INFO] consul: removing server serverB (Addr: 192.168.33.12:8300) (DC: dc1)
There are 2 problems here:
kill -9
. I'm suspecting a deadlock waiting on a channel somewhere.vagrant@agentA:~$ ./consul lock test "sleep 60"
Error running handler: signal: terminated
signal: terminated
^C^C^C^C
I also replicated it on a single server/single agent - same problem, but this time it seemed to detect the failure a bit better due to the complete lack of consul servers. It still left the child process running though!
@highlyunavailable Thanks for finding this! My guess is there is a channel blocking somewhere as well in the tear down path. I've tagged this as a bug. As it should just kill the child process in scenario 3.
@sean- If the client is partitioned off from one of the servers, who happens to be the leader, then things should still work. Clients pick a random server to talk to for RPCs for load balancing, and then the servers do internal request forwarding if they are not the leader. Given some time, the client should detect that server as partitioned via Serf and remove it from the list of eligible servers. So when the client makes an RPC call, it will be to one of the non-partitioned servers and that server should be able to forward to the leader. At least, thats how it should work :)
@armon So is the behaviour that highylunavailable described in scenario 1 the intended one? Meaning loss of the leader and election of a new one implies loss of all locks?
This is related to https://github.com/hashicorp/consul/issues/1843 where we are talking about using the ability to update the session as a possible signal to give up the lock.
From the docs (https://consul.io/docs/commands/lock.html)
What exactly does this mean? Does it mean the TCP connection between the agent holding the lock and the server with the leader status is interrupted? How does the "communication is disrupted" interact with Serf for liveliness?
If there is a partition between the agent holding the lock (AgentA) and the server leader, but the AgentA is still on the network and able to be contacted by Serf from other Agents in the data center, what happens? Said another way, if AgentA can't talk to the Server, but AgentB can reach both AgentA and the Server, how does the system handle this degraded state?