oogje / homebridge-i6-bigAssFans

A Homebridge plugin which allows you to control a Big Ass Fan.
MIT License
18 stars 3 forks source link

mDNS hostnames stopped working #29

Open jmissig opened 12 months ago

jmissig commented 12 months ago

Describe The Bug:

I’ve been successfully using the Entryway-XX:XX:XX.local type format since, well, April 23 2022 presumably (#6).

Today some combination of updates seems to have made that stop working. I repeatedly get:

Logs:

[7/22/2023, 7:15:42 AM] [homebridge-i6-bigassfans] Cocktail Room Overhead Fan (CocktailRoom-XX:XX:XX.local) : Unhandled network error: ENOTFOUND.  Attempting reconnect in 512 seconds.

Plugin Config:

{
    "fans": [
        {
            "name": "Master Bedroom Overhead Fan",
            "ip": “MasterBedroom-XX:XX:XX.local",
            "mac": “XX:XX:XX:XX:XX:XX",
            "fanModel": "Haiku H/I Series",
            "showWhooshSwitch": false,
            "showFanAutoSwitch": false,
            "showLightAutoSwitch": false,
            "showDimToWarmSwitch": false,
            "showEcoModeSwitch": false,
            "disableDirectionControl": false
        },
        {
            "name": "Cocktail Room Overhead Fan",
            "ip": “CocktailRoom-XX:XX:XX.local",
            "mac": "XX:XX:XX:XX:XX:XX",
            "fanModel": "Haiku H/I Series",
            "showWhooshSwitch": true,
            "showFanAutoSwitch": false,
            "showLightAutoSwitch": false,
            "showDimToWarmSwitch": false,
            "showEcoModeSwitch": false,
            "disableDirectionControl": false
        },
        {
            "name": "Study Overhead Fan",
            "ip": “Study-XX:XX:XX.local",
            "mac": "XX:XX:XX:XX:XX:XX",
            "fanModel": "Haiku H/I Series",
            "showWhooshSwitch": true,
            "showFanAutoSwitch": false,
            "showLightAutoSwitch": false,
            "showDimToWarmSwitch": false,
            "showEcoModeSwitch": false,
            "disableDirectionControl": false
        }
    ],
    "_bridge": {
        "username": "XX:XX:XX:XX:XX:XX"
    },
    "platform": "BigAssFans-i6"
}

Environment:

Things I tried:

For now I’ve switched to resolved IP address, which works.

jmissig commented 12 months ago

Obviously this may not be the fault of the plugin since other intermediary things updated too. Just figured I’d ask in case y’all can help!

My homebridge has been running with the Avahi advertiser for some time now. I did previously use Ciao.

oogje commented 12 months ago

I don't have any good ideas. What happens if you revert to the version of the plugin you were running before you updated to v0.5.4? You're running more recent versions of homebridge and node than I am so I'll update them here and see what happens.

For what it's worth I'm running homebridge on macOS 10.14.6 with Bonjour HAP and use the .local in the config file (Big Fan is the fan name, BigFan.local is its mDNS(?) name.

oogje commented 12 months ago

I've installed homebridge on macOS Ventura with the same versions you showed. In one of the six restarts it threw the ENOTFOUND error but after 3 retries (the backoff timing is very primitive) it reconnected.

[7/22/2023, 2:34:33 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local [7/22/2023, 2:34:33 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 2 seconds. [7/22/2023, 2:34:40 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local [7/22/2023, 2:34:40 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 4 seconds. [7/22/2023, 2:34:49 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local [7/22/2023, 2:34:49 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 8 seconds. [7/22/2023, 2:34:59 PM] [BigAssFans-i6] Big Fan reconnected!

I don't know why it got ENOTFOUND, why it didn't get ENOTFOUND every time, nor why the 3rd retry was successful. I'll keep playing with it.

jmissig commented 12 months ago

Interesting! Mine seems to always ENOTFOUND. I will try older plugin version later today.

jmissig commented 12 months ago

Well, went to downgrade to v0.5.3 to try it out, but now the issue is not recurring, even with exactly the setup (and versions) cited above. Odd.

oogje commented 12 months ago

I did an experiment to see if it's a purely a node.js issue. On machine A I ran a listener and on machine B running node v18.17.0 connected to that listener with a node client script. When the script tried to connect to machine A using its mDNS name it failed with EHOSTUNREACH. When I used its IP address it succeeded.

If I flipped roles, running the listener on machine B and Machine A running node V16.16.0 connecting to that listener, it successfully connected using the mDNS name of machine B.

When I was looking into ENOTFOUND problems with node.js, I came across discussions touching on addressing IPV4 vs IPV6. And my experiment seems to indicate an IPV6 angle: connect EHOSTUNREACH fe80::45f:b6ce:bafd:254b:31415 - Local (fe80::1ce2:18d:7abc:75cb%en0:57451. Also the word "random" came up in discussion, which might explain the inconsistencies we're seeing. Clearly I'm not very knowledgable about these things, so there's that too.

I don't why the error was EHOSTUNREACH rather than ENOTFOUND.

My next experiment (for another day) will be to try to specify IPV4 in the node connect() call.

[edited to add] - downgrading node to v16.16.0 on machine B allows the client script to successfully connect to the mDNS name of machine .

oogje commented 12 months ago

Well, went to downgrade to v0.5.3 to try it out, but now the issue is not recurring, even with exactly the setup (and versions) cited above. Odd.

That's seems to be the nature of the problem.

jmissig commented 12 months ago

Ok long story short, I made a mistake and caused cached accessories to duplicate, so I wasn't testing what I thought I was testing above. I'm back to ENOTFOUND with v0.5.4, and v0.5.3.

I think it'll be less easy for me to revert nodejs version but I'd totally believe that could be an issue.

jmissig commented 12 months ago

I see now that there are more-official homebridge packages for Ubuntu. Maybe I'll look into switching to those, I wonder which nodejs they're bundling.

oogje commented 12 months ago

I added an option to the node.js client script to specify IPV4 and the test succeeded with node v18.17.0 where it previously failed with that version. There's still the question why you (and I at least once) are seeing ENOTFOUND while my script was generating EHOSTUNREACH but I'm not going to worry about that for the time being.

It seems that the option to specify family (IPV4 or IPV6) in node is different for v16.x and v18.x although the documentation indicates it's the same. But when I run it on v16.x with the v18.x format (the string 'ipv4' vs the number 4) it blows up and tells me to use a number type. So it looks like I need to a) verify the documentation is incorrect and then b) detect the version of node.js in the plugin, followed by c) putting out a new beta since only you so far can reliably reproduce this issue.

[edited to add] the documentation is not incorrect. I was misreading it.

jmissig commented 12 months ago

Ok, downgrading nodejs fixed this. Pretty sure that was the issue. Thanks for the tip @oogje !

dpkg: warning: downgrading nodejs from 18.17.0-deb-1nodesource1 to 18.16.1-deb-1nodesource1
jmissig commented 12 months ago

Also whenever you have a new beta I'll know I can try on v18.17.0 specifically. Seems very consistent that it functions ok with v18.16.1 and always fails with v18.17.0.

oogje commented 12 months ago

If and when you're prepared for the real possibility you might have to back out of it, the new beta is: homebridge-i6-bigassfans@0.6.0-beta2.

jmissig commented 11 months ago

I tried homebridge-i6-bigassfans@v0.6.0-beta3 with my Node v18.16.1 and my existing setup ceases to work. I didn’t try moving to v18.17.0 yet for reasons you’ll understand in a second.

Trying to debug this is wild. It only gives ENOTFOUND for 1 of 3 fans, but all 3 fans are unresponsive with no errors in homebridge logs when I try to control them.

I briefly tried turning on Homebridge debug mode, but other plugins are spouting so much stuff that I can’t even catch the bigassfans plugin startup messages before the whole scrollback is filled.

Once I install and run homebridge with v0.6.0-beta3, even rolling back to v0.5.4 doesn’t work until I reboot the whole machine. I’m guessing it’s something weird in Avahi? Or something lower level than that, if even Ciao doesn’t make it work.

[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Cocktail Room Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Master Bedroom Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Study Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Init - initializing devices
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Master Bedroom Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Cocktail Room Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Study Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Cocktail Room Overhead Fan (CocktailRoom-XX:XX:XX.local) network connection broke [ENOTFOUND].  Attempting reconnect in 2 seconds.
oogje commented 11 months ago

I'm sorry you had to go through that. If I understand correctly you're now either using v0.5.4 with node v18.16.1 resolving fan names or you're using v0.5.4 with node v18.17.0 using IP addresses; and plugin v0.6.0-beta3 just doesn't work at all.

I'm not surprised v0.6.0-beta3 didn't fix the name resolution issue but I am surprised it hosed your machine. It sounds like debugging v0.6.0-beta3would be somewhat complicated on your system. We can do it by logging into the linux machine and examining the log file in a shell session, but it might not be how you want to spend your time. If it's seriously broken someone else will eventually open an issue.

So if you're good where you are now, I'll see if I can get answers about the ENOTFOUND business from somewhere. By the way, I'm using Bonjour HAP and still get the rareENOTFOUND with node v18.17.0 but it simply reconnects and moves on.

jmissig commented 11 months ago

It’s not hosed-hosed! It’s just odd that it requires a full reboot to get back to working. In my case it seems quite consistent — v0.6.0-beta3 or node v18.17.0 will make mDNS hostnames not work. And in my case, leaving it be for ~1 hour isn’t sufficient for it to begin working.

I can still use mDNS hostnames with v0.5.4 and node v18.16.1 as long as I reboot after experimenting.

Yes after all this it seems like a better use of my time would be to configure consistent IPs for my fans in my DHCP and just use IPs. But I do enjoy fiddling. 😆

jmissig commented 11 months ago

I think v0.6.0-beta3 simply is not working with my setup, even when fans are set via IP.

v0.5.4 is working fine.

oogje commented 11 months ago

If you want to fiddle I'm all for it. I understand you're running Homebridge on a linux machine, but are you by any chance using a Mac otherwise?

jmissig commented 11 months ago

Yes we have several MacBooks Air and Pro. I’m ssh’ing from one over to the Ubuntu Intel machine when I’m upgrading / downgrading its Node and Homebridge plugins.

In case this is a weird me-thing with my network / mDNS:

oogje commented 11 months ago

Frankly, the network beast is magical to me. I just try to appease it.

My hope is to get v0.6.0-beta3 working as well as v0.5.4 and move forward from there. Do you know where your homebridge log is located in the linux filesystem? On my Mac where I run homebridge, it's .homebridge/homebridge.log in my home folder.

If you can locate the log file then you can pick out only the fan plugin's messages with:

grep BigAssFans-i6 "<your-log-file-path-here>"

I'm my case:

grep BigAssFans-i6 ~/.homebridge/homebridge.log

The next step would be to add some debugging directives to your config.json for each of your fans to dump all the messages your fans send to the plugin. I should be able to use those logs to simulate your fans on my machine.

The debugging addition to each fan's config is: "debugLevels": [ [ "capabilities", 1 ], [ "network", 11 ] ]

don't forget to insert a comma before this line if appending it after "disableDirectionControl": false.

You can do this with v0.5.4. No need to run the beta.

I believe you already know how to start homebridge with "-D".

Having done that, restart, wait a few minutes, grep the fan messages out of the log file and either mail them to me at turbo_titled_0w@icloud.com or post them here. The logs will contain your MAC address, local LAN IP, and WiFi network name encoded in the a stream. Then remove the debugging directives or just change the numbers to zeros: "debugLevels": [ [ "capabilities", 0 ], [ "network", 0 ] ] and restart.

Then we'll see what's what.

jmissig commented 11 months ago

Sent v0.5.4 logs via email

oogje commented 11 months ago

I got your logs, thanks. Here's hoping there's a smoking gun in there. In any event, having these is very helpful in general so I can simulate different setups.

oogje commented 11 months ago

From the logs it seems the plugin thinks the Master Bedroom Overhead Fan has a downlight and the other two fans have no lights at all. Is that how your fans are equipped?

jmissig commented 11 months ago

That is correct.

oogje commented 11 months ago

That's a relief. In the meantime I've identified the problem. It has to do with (presumably) my misinterpreting your bedroom fan's schedule. Maybe the others also but this is the first one I tripped over. I'm still looking into it to see exactly where it goes south.

oogje commented 11 months ago

I'm hopeful homebridge-i6-bigassfans@0.6.0-beta4 will work for you at least as well as v0.5.4. The beta3 plugin would go into an infinite loop. On macOS Homebridge would crash fairly quickly with a message in the log file:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

and hb-service would restart it. But my simulated version of your fan wouldn't feed Homebridge without my intervention so my Homebridge instance wasn't itself in an infinite loop and I didn't have to reboot my machine.

jmissig commented 11 months ago

So far so good with v0.6.0-beta4 and using direct IP address for the fans.

Edit: No luck with mDNS hostnames with v0.6.0-beta4 plus node v18.17.0 on Avahi. But as you said, we’re at least where we were with v0.5.4! Seems likely a node / linux / something issue.

I'm hopeful homebridge-i6-bigassfans@0.6.0-beta4 will work for you at least as well as v0.5.4. The beta3 plugin would go into an infinite loop. On macOS Homebridge would crash fairly quickly with a message in the log file:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

and hb-service would restart it. But my simulated version of your fan wouldn't feed Homebridge without my intervention so my Homebridge instance wasn't itself in an infinite loop and I didn't have to reboot my machine.

This seems very likely why I had to reboot. I’m using child bridges, but sounds like hb-service wasn’t correctly detecting infinite loop. I probably had a stuck looping child bridge going the whole time until rebooting, which may somehow have interfered with homebridge.

oogje commented 11 months ago

Am glad we resolved the issue with beta3. Thanks. Can you try to run this test script under v18.17.0 and see what it shows? Save it to a file and make it executable. Here's what I see on my machine, first with an invalid port number, then the actual one.:

ber@rolo misc % ./connectTest.mjs  BigFan.local 1234 
node version: 18.17.0
connecting to BigFan.local 1234
connect ECONNREFUSED 192.168.7.214:1234
ber@rolo misc % ./connectTest.mjs BigFan.local 31415
node version: 18.17.0
connecting to BigFan.local 31415
connected okay, will now disconnect
ber@rolo misc % 
#!/usr/bin/env node

import { argv } from 'node:process';
import net from 'node:net';

if (argv.length != 4) {
  console.log(`Usage: ${argv[0]} <host> <port>`);
  process.exit(1);
}

console.log(`node version: ${process.versions.node}`);

networkSetup(argv[3], argv[2]);

function networkSetup(port, host) {
  console.log("connecting to " + host + " " + port);
    const client = net.connect({port: port, host: host, family: 4}, () => {
        console.log("connected okay, will disconnect");
        client.end();
    })

    let errHandler;
    client.on('error', errHandler = (err) => {
        console.log(err.message);
    })
    client.on('data', (data) => {
      console.log("got data");
    })
}
jmissig commented 11 months ago
(nodejs@saga)=(~)=($ ./connectTest.mjs MasterBedroom-XX:XX:XX.local 31415
node version: 18.17.0
connecting to MasterBedroom-XX:XX:XX.local 31415
getaddrinfo ENOTFOUND MasterBedroom-XX:XX:XX.local

(nodejs@saga)=(~)=($ ping MasterBedroom-XX:XX:XX.local
PING MasterBedroom-XX:XX:XX.local (172.16.1.237) 56(84) bytes of data.
64 bytes from MasterBedroom (172.16.1.237): icmp_seq=1 ttl=128 time=8.45 ms

Hmm, just testing a hunch here...

(nodejs@saga)=(~)=($ ./connectTest.mjs 172.16.1.237 31415
node version: 18.17.0
connecting to 172.16.1.237 31415
connected okay, will disconnect

(nodejs@saga)=(~)=($ ./connectTest.mjs Vali.local 22
node version: 18.17.0
connecting to Vali.local 22
connected okay, will disconnect
got data

What if NodeJS 18.17.0, or something between that and my Linux machine's Avahi/mDNS/whatever, no longer likes : in a hostname?

oogje commented 11 months ago

That sounds like a solid hunch.

jmissig commented 11 months ago

Doing some googling. Using : in hostnames — even mDNS hostnames — seems controversial at best, and completely against spec at worst. We might just be up against Big Ass Fans not correctly setting mDNS hostnames.

Even Avahi on this Ubuntu can't be consistent with itself wrt the Big Ass Fans hostnames:

(nodejs@saga)=(~)=(% avahi-resolve-address 172.16.1.237
172.16.1.237    MasterBedroom

(nodejs@saga)=(~)=(% avahi-resolve-host-name MasterBedroom               
Failed to create host name resolver: Invalid host name

(nodejs@saga)=(~)=(% ping MasterBedroom 
ping: MasterBedroom: Temporary failure in name resolution

(nodejs@saga)=(~)=(% ping MasterBedroom-XX:XX:XX.local
PING MasterBedroom-XX:XX:XX.local (172.16.1.237) 56(84) bytes of data.
64 bytes from MasterBedroom (172.16.1.237): icmp_seq=1 ttl=128 time=5.87 ms

(nodejs@saga)=(~)=(% avahi-resolve-host-name MasterBedroom-XX:XX:XX.local
MasterBedroom-XX\058XX\058XX.local  172.16.1.237

At this point I've set fixed IPs for my fans anyway and am perfectly happy just letting this all go. It was a fun curiosity.

oogje commented 11 months ago

I don't know where that -xx:xx:xx.local name comes from. I just have <fan name[without white space, punctuation, etc].local on macOS. Using the last three octets of the MAC address format isn't pingable here.

Yes, it was fun. I learned a bit. Not enough, but a bit.

jmissig commented 11 months ago

Yeah weirdly FanName.local does not resolve for me anywhere. Not on my Macs or my Linux box. It presumably comes from the fans setting the address for themselves, but I'm not sure why yours and mine have differences there.

oogje commented 11 months ago

Maybe it's tied to the model. I have an i6.

jmissig commented 11 months ago

Seems very likely. Newer firmware, no longer setting out-of-spec mDNS names. Just funny they didn't bump that part of the software with the big update to move over to the new protocol and app.

jmissig commented 11 months ago

As always, thanks for all you do for this plugin. Thanks for coming along on my probably-unnecessary adventures.

And I think(?) we accidentally caught what would’ve been a real bug with v0.6.0 before it made it out.

oogje commented 11 months ago

As always, my pleasure. And yes, tripping over the schedule bug was fortunate indeed.

jmissig commented 5 months ago

Just to document in case people come across this thread: I've moved to a Mac mini as my Homebridge server. It seems to be picking up both FanName and fanname.haiku.home. But FanName.local still does not work. Wild!

(julian@Saga)=(~)=(% host 172.16.1.237
237.1.16.172.in-addr.arpa domain name pointer masterbedroom.haiku.home.

(julian@Saga)=(~)=(% ping masterbedroom
PING masterbedroom (172.16.1.237): 56 data bytes
64 bytes from 172.16.1.237: icmp_seq=0 ttl=128 time=3.127 ms
64 bytes from 172.16.1.237: icmp_seq=1 ttl=128 time=8.755 ms

(julian@Saga)=(~)=(% ping masterbedroom.haiku.home 
PING masterbedroom.haiku.home (172.16.1.237): 56 data bytes
64 bytes from 172.16.1.237: icmp_seq=0 ttl=128 time=13.490 ms
64 bytes from 172.16.1.237: icmp_seq=1 ttl=128 time=3.150 ms

(julian@Saga)=(~)=(% ping MasterBedroom.local
ping: cannot resolve MasterBedroom.local: Unknown host

(julian@Saga)=(~)=(% sw_vers
ProductName:        macOS
ProductVersion:     14.3
BuildVersion:       23D56