selecting wrong socket to send ack for remove address

VenkateswaranJ commented 2 years ago

mptcp_remove

I'm using Netlink path manager and I have connected RTM_DELLINK event to mptcp MPTCP_CMD_REMOVE. Now the problem is when I turn down the interface eth1 on the client-side triggered the remove address command but it chooses the socket of subflow two (10.100.130.2 <-> 10.100.130.3) to send the removed address dup ack instead of choosing the subflow one socket. Since there is no connection via subflow two socket to the server (interface eth1 is down) the removed address message is not being delivered to the server.

https://github.com/multipath-tcp/mptcp/blob/4857473fd542ecd05c63af001c63f87571196c25/net/mptcp/mptcp_ctrl.c#L577

I can't clearly understand the socket selection mechanism written in this function. But this only happens when trying this setup on the virtual environment like between two virtual machines or namespaces and when I try this between two physical machines it chooses the correct active socket to send remove subflow address. Why does it's not choosing the correct socket in virtual environments?

Here is my mptcp configuration on both sides

kernel.osrelease = 4.19.126.mptcp
net.mptcp.mptcp_checksum = 1
net.mptcp.mptcp_debug = 1
net.mptcp.mptcp_enabled = 1
net.mptcp.mptcp_path_manager = netlink
net.mptcp.mptcp_scheduler = redundant
net.mptcp.mptcp_syn_retries = 3
net.mptcp.mptcp_version = 0

matttbe commented 2 years ago

Hi,

Sorry, I was busy with many other things recently.

It looks like a bug. Do you also have this issue with the "default" scheduler? And with the fullmesh PM?

matttbe commented 2 years ago

I was thinking a bit about that and technically, it looks more logical to first remove the subflow, then send the RM_ADDR. Is it what you are doing?

Because the userspace is just saying 'send a RM_ADDR with ID x'. The kernel will do that and select a ACK like it would do to send "the next ACK" and I don't think it is its jobs to iterate over all subflows using the ID x (maybe the userspace didn't manage well the IDs) and mark all linked subflows as pf (potentially failed).

VenkateswaranJ commented 2 years ago

Do you also have this issue with the "default" scheduler?

I haven't tried that.

And with the fullmesh PM?

Yes I have tried the same virtual setup with fullmesh PM and when I pull down one interface, for some reason mptcp_address_worker in fullmesh PM emits two address remove signal , so it calls the remove address command twice. First time it chooses the wrong socket to send remove id to remote but second time it chooses the right active socket.

it looks more logical to first remove the subflow, then send the RM_ADDR. Is it what you are doing?

No may be I have to try this first. I'm doing opposite way. First I send remove id and destroy subflows manually associates to that id. I'm doing this way because mptcp itself tcp rst all those associated subflows when I send remove command.

But your point is correct I have to delete subflows first before sending remove id. In this case it can't choose dead socket isn't it ? Because only the active sockets exist at this time.

(maybe the userspace didn't manage well the IDs) and mark all linked subflows as pf (potentially failed).

Hmmm. Sorry I don't get this point 😟

matttbe commented 2 years ago

And with the fullmesh PM?

Yes I have tried the same virtual setup with fullmesh PM and when I pull down one interface, for some reason mptcp_address_worker in fullmesh PM emits two address remove signal , so it calls the remove address command twice. First time it chooses the wrong socket to send remove id to remote but second time it chooses the right active socket.

Interesting. That doesn't look ideal but the userspace PM could also do the same: send RM_ADDR, remove subflow, resend a RM_ADDR. But it might make more sense to skip the first step: remove subflow, then send a RM_ADDR.

it looks more logical to first remove the subflow, then send the RM_ADDR. Is it what you are doing?

No may be I have to try this first. I'm doing opposite way. First I send remove id and destroy subflows manually associates to that id. I'm doing this way because mptcp itself tcp rst all those associated subflows when I send remove command.

But your point is correct I have to delete subflows first before sending remove id. In this case it can't choose dead socket isn't it ? Because only the active sockets exist at this time.

That's correct. If it does, there is a bug :) (or a race condition but that should be addressable too I guess)

(maybe the userspace didn't manage well the IDs) and mark all linked subflows as pf (potentially failed).

Hmmm. Sorry I don't get this point worried

Sorry, my point is that if a userspace daemon is in charge of the PM, the kernel should not try to do more than what the userspace PM asks. In this case, I don't think the kernel should try to select a path depending on the ID that has been provided by the userspace. In other words, best to first ask to destroy the subflow, then to send the RM_ADDR.

VenkateswaranJ commented 2 years ago

@matttbe Perfect! It works :smiley: I removed all the subflows associated with the removed IP address and then sends the remove id command. Now it chooses the active socket always.

Interesting. That doesn't look ideal but the userspace PM could also do the same: send RM_ADDR, remove subflow, resend an RM_ADDR.

Sorry! I said wrong in my last comment. Actually mptcp_address_worker gets single remove event from netdev but fullmesh path manager loops through the sockets and calls remove id command as you can see below. That's what I mentioned it calling twice.

https://github.com/multipath-tcp/mptcp/blob/4857473fd542ecd05c63af001c63f87571196c25/net/mptcp/mptcp_fullmesh.c#L922-L955

Also, as you said it sets pf = 1 for every iteration. But userspace path manager doesn't have these levels of access to mptcp resources.

Anyway thanks once again. You can close this issue if you want.

matttbe commented 2 years ago

Great!

I think it is enough to close the subflow (data will be reinjected directly if I'm not mistaken) and announce the remove address just after.

If you see weird behaviour, we might improve that but ideally, sending the remove address should be limited to sending the remove address. I don't think the kernel should interpret that as "I also need to close subflows / mark them as pf , etc."

Thank you for having tested!

multipath-tcp / mptcp

selecting wrong socket to send ack for remove address #457