jbagg / QtZeroConf

Qt wrapper class for ZeroConf libraries across various platforms.
Other
69 stars 51 forks source link

Power-Cycling Bonjour Devices Can Cause Permanent Loss #27

Open wumpus7 opened 5 years ago

wumpus7 commented 5 years ago

Repro (on Mackie DL hardware):

1) Power up a DL32S mixer and set it to use its internal router to generate a WiFi network. 2) Connect an instance of Master Fader (5) to the mixer via the WiFi network. Forget all other networks. 3) Power cycle the mixer. (Note that the mixer doesn't have the ability to send a power-down 'lost' message.)

The usual result is that once the WiFi network comes back up, the mixer will reconnect to it automatically. (You may need to reselect the WiFi network, as the wireless auto connect doesn't always seem to work. In this case, once the WiFi is manually reconnected, the mixer will then automatically reconnect.)

Sometimes, however, while the WiFi network is reconnected (automatically or manually), the mixer doesn't reconnect. There are at least two different issues involved:

A) Sometimes this appears to be the fault of the OS - either iOS or macOS. In this case, the mixer cannot be discovered using Flame or Discovery (iOS) or dns-sd (macOS). If the OS can't find the mixer, then Q(t)ZeroConf clearly can't be expected to, but

B) Sometimes the mixer can be discovered, but it is still not reported out of the Q(t)ZeroConf browser. In this case it seems likely that the issue is with Q(t)ZeroConf, as we've only seen this issue with iOS and macOS, not with Android or Windows, and our code is very generic in its handling of different OSs, whereas Q(t)ZeroConf has completely different sources for iOS vs. Android (?).

In both cases, though, the fix is to turn off the WiFi on the desktop or iPad and then turn it back on; this almost always restores the connection. (Note also that this result is per Mac/iPad; one can be stuck while others have successfully reconnected, so it's clearly not the mixer. We did have trouble with the mixer advertising itself at one point, but we haven't seen that in a while.)

Anyway, does this ring any bells? Is the lack of a 'lost' message on device power cycle a problem? Or maybe something with the timing of the device coming up and restarting its WiFi and its Bonjour services?

jbagg commented 5 years ago

On 2/18/19 7:56 PM, wumpus7 wrote:

B) Sometimes the mixer can be discovered, but it is still not reported out of the Q(t)ZeroConf browser. In this case it seems likely that the issue is with Q(t)ZeroConf, as we've only seen this issue with iOS and macOS, not with Android or Windows, and our code is very generic in its handling of different OSs, whereas Q(t)ZeroConf has completely different sources for iOS vs. Android (?).

QtZeroConf uses Apple's dns-sd library on ios, Mac and windows. QtZeroConf use avahi client library on Linux and avahi core library on Android.

When you say the mixer can not be discovered in iOS or Mac, do you see the mixer in Linux in a program called avahi-discover (may also be called Avavi Browser.)?

wumpus7 commented 5 years ago

I'm not sure exactly what you're asking. The issue is a per client problem, inasmuch as the mixer can be discoverable on several iPads at the same time that it is not on another. We haven't had the problem at all with Android devices (I'm not sure we've done any testing on Windows).

If the Avahi Browser program can be run on a Mac, then it would be interesting to see if the alternate Bonjour toolkit could succeed when dns-sd was failing. Is that what you're interested in?

jbagg commented 5 years ago
  1. On iOS, when you call startBrowser("some_type._tcp"); are you specifying the protocol (IPv4, IPv6 or any). If you are not specifying the protocol, please call startBrowser() with the protocol specified like this.... startBrowser("some_type._tcp", QAbstractSocket::IPv4Protocol);

  2. Are you connecting to QZeroConf's signal serviceUpdated()? If the IP address of the device changes (switching wired to wireless), serviceUpdated() will fire and the QZeroConfService will have the new IP address. Note that if the IP address does not change when the mixer reboots, there may be no notification from zeroconf / mDNS. (this is normal) This will happen if the device (mixer) comes back online before the apox 2 zeroconf / mDNS timeout.

  3. When the mixer can not be found on the iOS device, has the IP address changed? What do the working iOS devices or other browsers say the IP address of the mixer is?

  4. Are you stopping the QZeroConf browser when the iOS device is put to sleep and re-starting the browser when it wakes back up again? See "iOS device sleep" in the readme and example in source code.

wumpus7 commented 5 years ago

1) No, I'm not specifying a protocol currently; I'll update my calls. 2) Yes, I have hooked serviceUpdated, and have verified that it actually works when mixer IP addresses change. (Wired uses MFi/EAAccessory, not TCP/IP, but it is possible for the wireless address to change.) The usual case is for the IP address not to change on power cycle, so that's good information to have. 3) I'm working on getting the bug to reoccur, but the case where it isn't the OS causing the problem is extremely rare, so I don't have a lot of data about what's going on. 4) Yes, I stop the browser (suppressing the automatically generated loss messages for all discovered devices) at sleep and start it again after wake. In this case the app is not changing state, just the mixer, though if you have to manually toggle the WiFi network, that will cause a sleep/wake.

jbagg commented 5 years ago
  1. I was wrong with this point. When no protocol is specified, it defaults to IPv4.

I did a bunch of testing the last couple of days. The serviceUpdated() signal does not work at all on Mac and windows (and probably iOS as well). It looks like I need to keep the DNSServiceRef for the resolver around until the service is removed to get (address) updates. Each resolver also needs a socket and and address socket as well. This kinda sucks having all these objects around for each service. avahi is cleaner / simpler in this regard. Hopefully your problem is related to this. I should have some new code in about a week.

jbagg commented 5 years ago

Please replace bonjour.cpp and bonjour_p.h with the attached and see if it fixes your problem. serviceUpdated() is now working with the attached. Please note this is not finished code. bonjour_update_fix.zip

wumpus7 commented 5 years ago

I'm pretty sure that I have seen the serviceUpdated() signal fire in multiple circumstances on the Mac/iOS. I originally had it hooked to something that threw an assert, and it was going off when a mixer's IP address changed (which I believe we encountered in testing our mixer with a configurable internal router). (Early on, I think I'd assumed it was for IPv6 information, though I don't know how I arrived at that conclusion.) Anyway, I'll give the updated code a try when I get a chance, but, as I mentioned before, it's very rare to reproduce this issue, so it's also very difficult to verify fixes.

jbagg commented 4 years ago

I was able to restore my mac finally. Turns out you can no longer download a OSX installer unless you own another MAC. I was able to reproduce the issue right away. I pushed a fix in commit d346cd7. Please verify the issue is fixed with latest commit.