IPv6 clients with privacy addrs can not talk after some while

bmwiedemann commented 10 years ago

After switching to a new mumble server that has a DNS entry with IPv6 (and IPv4), we encountered an interesting new bug. After some hours, openSUSE Linux' mumble clients with IPv6 addrs and privacy addresses enabled would remain connected and could still listen to channels, but when they tried to talk, others would not hear them, so they had to manually disconnect and reconnect.

This is different from issue 1289 where both directions were broken.

My guess is that IPv6 privacy addresses are temporary and expire after some hours ip a shows that, too

    inet6 2620:113:80c0:8080:a56a:47fb:f623:6949/64 scope global temporary dynamic 
       valid_lft 3238sec preferred_lft 1438sec

Could also be an issue with the Linux kernel that should not drop addrs when they are still in use (not sure, if connection-less UDP can be counted as "in use" though).

hacst commented 10 years ago

Interesting. That might actually make sense. Iirc the kernel keeps listening on discarded privacy addresses for while. And as you mentioned UDP is connection-less so we might be sending with an address the mumble server doesn't recognize and discard. This should still be picked up by our UDP channel ping and trigger a fallback to TCP which ought to work just fine as it isn't affected from the privacy address shuffling. Quite likely some bad interaction.

In any case. This information should be enough to reproduce the issue. We will have to read up on this privacy feature and take a good look on the UDP channel code so we can figure out how the interact to cause this behavior. Thanks for taking the time to track this down.

bmwiedemann commented 10 years ago

note: we found one workaround to be echo 0 > /proc/sys/net/ipv6/conf/$INTERFACE/use_tempaddr for some reason "all" instead of interface name did not get rid of tempaddr usage. The problematic default value in there was 2

aspiers commented 8 years ago

Any update on this? It's still affecting many of us.

aspiers commented 8 years ago

Also, please could the audio label be applied to this issue? Thanks!

mkrautz commented 8 years ago

No update. Unfortunately, I don't think anyone has looked into this yet.

I've assigned a 1.3.0 milestone and hope to look into it soon.

Would an option to use @bmwiedemann's patches https://github.com/bmwiedemann/mumble/commit/65bdc3e212553ca93508a0b35ec810da28e35e33 https://github.com/bmwiedemann/mumble/commit/742eba9e68d29adc74cd360c2e2f933305a9223e be helpful?

mkrautz commented 8 years ago

And, @bmwiedemann, do you think so?

aspiers commented 8 years ago

@bmwiedemann is currently on vacation, but I guess he'll reply when he's back. Thanks a lot for the reply!

bmwiedemann commented 8 years ago

In a quick test those patches did not seem to fix it (packets still coming from temp addr instead of public IP), otherwise I would have submitted a PR. At least for openSUSE-Tumbleweed there was a different fix coming that changes the default value of use_tempaddr from 2 to 1 which should also help this issue (and NFS connectivity problems)

mkrautz commented 8 years ago

Sorry for taking so long to respond to this. I've given this some thought...

In a scenario where privacy addresses are used, I'd expect the following to happen at address expiry:

The TCP connection is kept alive. There's an active connection going on, the kernel can't just tear it down. (Or even it if did, the Mumble client would detect a disconnect and reconnect... That'd "fix" the problem as well.)
Since the address is expired, Mumble will no longer send new UDP datagrams from that address. Mumble uses the Any address: https://github.com/mumble-voip/mumble/blob/master/src/mumble/ServerHandler.cpp#L589-L593

One plausible cause is that Murmur expects a one-to-one mapping between the TCP and UDP connection addresses. If a UDP packet comes in from an unknown host:port combo, Murmur will try to look up a sender by using a map of TCP address <-> User. (See https://github.com/mumble-voip/mumble/blob/master/src/murmur/Server.cpp#L822). Most importantly, this means that if the Mumble client keeps using the temporary address for the TCP connection, but switches to another address for UDP, Murmur will never be able to map the incoming UDP packets back to the correct user -- since it only looks up users by their TCP address.

As I see it, there are two possible fixes.

It's possible that the Mumble client should always try to send UDP datagrams from the same source address that it is using for TCP. It might be possible to transmit UDP packets from an expired privacy address, as long as the TCP connection is alive on the same address.

Or, Murmur should be taught to try to not only use the TCP addr <-> User mapping (from https://github.com/mumble-voip/mumble/blob/master/src/murmur/Server.cpp#L822) to map incoming UDP packets to a user. The problem with extending the check is that it can be expensive on larger servers with a lot of connected users.

(One possible optimization of that would be to allow the Mumble client to -- as a hint -- specify a list of IP:port combos to expect UDP packets from, and allow the client to update the list dynamically via the TCP connection. But that requires some thought, since it would allow clients to cause Murmur to do extra work, potentially DoSing the server...).

I haven't tested these hypotheses yet, but I hope to be able to soon.

mkrautz commented 8 years ago

OK, I've now had to test this a bit more (albeit with very short temp addr lifetimes).

I set temp_prefered_lft=180, temp_valid_lft=360 and temp_addr=2 for my interface. (under sysctl net.ipv6.conf.$INTERFACE...

My observations:

I can connect to my IPv6 Murmur instance with the privacy address just fine.

I can transmit audio and everything seems to work.

After my first temporary address's preferred lifetime is expired (after 3 minutes) a new temporary address is created. The original is marked "deprecated" in ip addr list.

At this point, the TCP connection is still alive. But my UDP packets are sent using the new temporary address. The server does not map them to my user, so other connected users can't hear me.

After the temporary address is fully expired after 6 minutes, the server eventually notices that there is no longer any connection. It disconnects me. But the client never notices.

Another observation is that after my second temporary address, I am never assigned new ones. That's odd.

Anyway, per my observations:

As long as privacy address is inside its preferred lifetime, both TCP and UDP will be transported via that address. Once a privacy address is no longer preferred, UDP packets will no longer be sent with that privacy address as the source address. If another privacy address is available, it will use that. Or, perhaps even fall back to a non-temp address, if available. The TCP connection is still alive, until the privacy address is removed from the system (the "valid" lifetime).

Possible solutions:

Teach Murmur to not expect a 1-1 mapping between TCP and UDP addresses. (From a code-perspective, this is easily doable. But it has a price. A lot of extra attempted packet decryptions when trying to determine the user that sent a given packet.)
Make the Mumble client bind the UDP socket to the same address that the TCP connection is using. This would solve the problem of Mumble suddenly switching the UDP source address mid-connection because the privacy address's preferred lifetime expired. Also, since the TCP connection eventually dies, perhaps this is the best solution for now.

Furthermore, we probably need to check up on Mumble's ping/pong code to make sure the client can detect that the TCP connection is no longer working faster than it currently does.

mkrautz commented 8 years ago

OK, I've created a PR that makes Mumble always bind to the same IP address for UDP that it uses for TCP: https://github.com/mumble-voip/mumble/pull/2623

This should fix the problem where voice packets stop after a few hours. The problem here was that when a new "preferred" temp address was created, Mumble would use it for any outgoing UDP packets. The result being that the server couldn't figure out which user they belonged to, and just dropped them.

With the PR in place, as long as the privacy address still exists, Mumble will be able to use it for both TCP and UDP. However, once the privacy address is deleted by the kernel, Mumble will have to reconnect. However, since privacy address lifetimes are ~1 day by default (in the distros I've tried, at least), it doesn't seem to be too severe of a problem....

To better work around the problem where the underlying connection is suddenly gone, I've also created this PR: https://github.com/mumble-voip/mumble/pull/2622

It implements a more aggressive reconnect logic for when the TCP connection seems to have disappeared under us. With this PR, Mumble will now try to reconnect if it doesn't get a reply to two consecutive ping packets. (~10 seconds).

That makes the behavior of the Mumble client much better in the case when a privacy address expires.

During this work, I also stumbled upon a Fedora Bugzilla entry about similar work for OpenSSH: https://bugzilla.redhat.com/show_bug.cgi?id=512032 ...which didn't ever seem to get implemented properly, or upstreamed...

Anyway, maybe it would be worth considering whether an option in Mumble to not prefer IPv6 privacy addresses? Or, do you think my two PRs improve the situation enough to not warrant that?

EmperorArthur commented 8 years ago

Anyway, maybe it would be worth considering whether an option in Mumble to not prefer IPv6 privacy addresses? Or, do you think my two PRs improve the situation enough to not warrant that?

I believe the trade offs are worth it. We should keep using the privacy addresses. After all, this is a system wide problem for any IPV6 application. If a user wants long lasting connections they can turn off privacy addresses for the entire system.

mkrautz commented 7 years ago

OK, with

https://github.com/mumble-voip/mumble/issues/2622 and https://github.com/mumble-voip/mumble/issues/2623

landed, I believe we've fixed this as good as we can?

I'll close this for now. If you feel this is in error, or there is something we can do better, please speak up!

Thanks!

mumble-voip / mumble

IPv6 clients with privacy addrs can not talk after some while #1377