Open jmissig opened 1 year ago
Obviously this may not be the fault of the plugin since other intermediary things updated too. Just figured I’d ask in case y’all can help!
My homebridge has been running with the Avahi advertiser for some time now. I did previously use Ciao.
I don't have any good ideas. What happens if you revert to the version of the plugin you were running before you updated to v0.5.4? You're running more recent versions of homebridge and node than I am so I'll update them here and see what happens.
For what it's worth I'm running homebridge on macOS 10.14.6 with Bonjour HAP and use the
I've installed homebridge on macOS Ventura with the same versions you showed. In one of the six restarts it threw the ENOTFOUND error but after 3 retries (the backoff timing is very primitive) it reconnected.
[7/22/2023, 2:34:33 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local
[7/22/2023, 2:34:33 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 2 seconds.
[7/22/2023, 2:34:40 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local
[7/22/2023, 2:34:40 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 4 seconds.
[7/22/2023, 2:34:49 PM] [BigAssFans-i6] dblog newcode(1/1) Big Fan - getaddrinfo ENOTFOUND BigFan.local
[7/22/2023, 2:34:49 PM] [BigAssFans-i6] Big Fan (BigFan.local) : Unhandled network error: ENOTFOUND. Attempting reconnect in 8 seconds.
[7/22/2023, 2:34:59 PM] [BigAssFans-i6] Big Fan reconnected!
I don't know why it got ENOTFOUND, why it didn't get ENOTFOUND every time, nor why the 3rd retry was successful. I'll keep playing with it.
Interesting! Mine seems to always ENOTFOUND. I will try older plugin version later today.
Well, went to downgrade to v0.5.3 to try it out, but now the issue is not recurring, even with exactly the setup (and versions) cited above. Odd.
I did an experiment to see if it's a purely a node.js issue. On machine A I ran a listener and on machine B running node v18.17.0 connected to that listener with a node client script. When the script tried to connect to machine A using its mDNS name it failed with EHOSTUNREACH. When I used its IP address it succeeded.
If I flipped roles, running the listener on machine B and Machine A running node V16.16.0 connecting to that listener, it successfully connected using the mDNS name of machine B.
When I was looking into ENOTFOUND problems with node.js, I came across discussions touching on addressing IPV4 vs IPV6. And my experiment seems to indicate an IPV6 angle:
connect EHOSTUNREACH fe80::45f:b6ce:bafd:254b:31415 - Local (fe80::1ce2:18d:7abc:75cb%en0:57451
.
Also the word "random" came up in discussion, which might explain the inconsistencies we're seeing. Clearly I'm not very knowledgable about these things, so there's that too.
I don't why the error was EHOSTUNREACH rather than ENOTFOUND.
My next experiment (for another day) will be to try to specify IPV4 in the node connect() call.
[edited to add] - downgrading node to v16.16.0 on machine B allows the client script to successfully connect to the mDNS name of machine .
Well, went to downgrade to v0.5.3 to try it out, but now the issue is not recurring, even with exactly the setup (and versions) cited above. Odd.
That's seems to be the nature of the problem.
Ok long story short, I made a mistake and caused cached accessories to duplicate, so I wasn't testing what I thought I was testing above. I'm back to ENOTFOUND with v0.5.4, and v0.5.3.
I think it'll be less easy for me to revert nodejs version but I'd totally believe that could be an issue.
I see now that there are more-official homebridge packages for Ubuntu. Maybe I'll look into switching to those, I wonder which nodejs they're bundling.
I added an option to the node.js client script to specify IPV4 and the test succeeded with node v18.17.0 where it previously failed with that version. There's still the question why you (and I at least once) are seeing ENOTFOUND while my script was generating EHOSTUNREACH but I'm not going to worry about that for the time being.
It seems that the option to specify family (IPV4 or IPV6) in node is different for v16.x and v18.x although the documentation indicates it's the same. But when I run it on v16.x with the v18.x format (the string 'ipv4' vs the number 4) it blows up and tells me to use a number type. So it looks like I need to a) verify the documentation is incorrect and then b) detect the version of node.js in the plugin, followed by c) putting out a new beta since only you so far can reliably reproduce this issue.
[edited to add] the documentation is not incorrect. I was misreading it.
Ok, downgrading nodejs fixed this. Pretty sure that was the issue. Thanks for the tip @oogje !
dpkg: warning: downgrading nodejs from 18.17.0-deb-1nodesource1 to 18.16.1-deb-1nodesource1
Also whenever you have a new beta I'll know I can try on v18.17.0 specifically. Seems very consistent that it functions ok with v18.16.1 and always fails with v18.17.0.
If and when you're prepared for the real possibility you might have to back out of it, the new beta is:
homebridge-i6-bigassfans@0.6.0-beta2
.
I tried homebridge-i6-bigassfans@v0.6.0-beta3
with my Node v18.16.1
and my existing setup ceases to work. I didn’t try moving to v18.17.0
yet for reasons you’ll understand in a second.
Trying to debug this is wild. It only gives ENOTFOUND
for 1 of 3 fans, but all 3 fans are unresponsive with no errors in homebridge logs when I try to control them.
I briefly tried turning on Homebridge debug mode, but other plugins are spouting so much stuff that I can’t even catch the bigassfans plugin startup messages before the whole scrollback is filled.
Once I install and run homebridge with v0.6.0-beta3
, even rolling back to v0.5.4
doesn’t work until I reboot the whole machine. I’m guessing it’s something weird in Avahi? Or something lower level than that, if even Ciao doesn’t make it work.
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Cocktail Room Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Master Bedroom Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Loading accessory from cache: Study Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Init - initializing devices
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Master Bedroom Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Cocktail Room Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Restoring existing accessory from cache: Study Overhead Fan
[7/28/2023, 4:32:59 PM] [homebridge-i6-bigassfans] Cocktail Room Overhead Fan (CocktailRoom-XX:XX:XX.local) network connection broke [ENOTFOUND]. Attempting reconnect in 2 seconds.
I'm sorry you had to go through that. If I understand correctly you're now either using v0.5.4
with node
v18.16.1
resolving fan names or you're using v0.5.4
with node
v18.17.0
using IP addresses; and plugin v0.6.0-beta3
just doesn't work at all.
I'm not surprised v0.6.0-beta3
didn't fix the name resolution issue but I am surprised it hosed your machine. It sounds like debugging v0.6.0-beta3
would be somewhat complicated on your system. We can do it by logging into the linux machine and examining the log file in a shell session, but it might not be how you want to spend your time. If it's seriously broken someone else will eventually open an issue.
So if you're good where you are now, I'll see if I can get answers about the ENOTFOUND business from somewhere. By the way, I'm using Bonjour HAP and still get the rareENOTFOUND
with node
v18.17.0
but it simply reconnects and moves on.
It’s not hosed-hosed! It’s just odd that it requires a full reboot to get back to working. In my case it seems quite consistent — v0.6.0-beta3
or node v18.17.0
will make mDNS hostnames not work. And in my case, leaving it be for ~1 hour isn’t sufficient for it to begin working.
I can still use mDNS hostnames with v0.5.4
and node v18.16.1
as long as I reboot after experimenting.
Yes after all this it seems like a better use of my time would be to configure consistent IPs for my fans in my DHCP and just use IPs. But I do enjoy fiddling. 😆
I think v0.6.0-beta3
simply is not working with my setup, even when fans are set via IP.
v0.5.4
is working fine.
If you want to fiddle I'm all for it. I understand you're running Homebridge on a linux machine, but are you by any chance using a Mac otherwise?
Yes we have several MacBooks Air and Pro. I’m ssh’ing from one over to the Ubuntu Intel machine when I’m upgrading / downgrading its Node and Homebridge plugins.
In case this is a weird me-thing with my network / mDNS:
Frankly, the network beast is magical to me. I just try to appease it.
My hope is to get v0.6.0-beta3
working as well as v0.5.4
and move forward from there. Do you know where your homebridge log is located in the linux filesystem? On my Mac where I run homebridge, it's .homebridge/homebridge.log
in my home folder.
If you can locate the log file then you can pick out only the fan plugin's messages with:
grep BigAssFans-i6 "<your-log-file-path-here>"
I'm my case:
grep BigAssFans-i6 ~/.homebridge/homebridge.log
The next step would be to add some debugging directives to your config.json for each of your fans to dump all the messages your fans send to the plugin. I should be able to use those logs to simulate your fans on my machine.
The debugging addition to each fan's config is:
"debugLevels": [ [ "capabilities", 1 ], [ "network", 11 ] ]
don't forget to insert a comma before this line if appending it after "disableDirectionControl": false
.
You can do this with v0.5.4
. No need to run the beta.
I believe you already know how to start homebridge with "-D".
Having done that, restart, wait a few minutes, grep the fan messages out of the log file and either mail them to me at turbo_titled_0w@icloud.com
or post them here. The logs will contain your MAC address, local LAN IP, and WiFi network name encoded in the a stream. Then remove the debugging directives or just change the numbers to zeros: "debugLevels": [ [ "capabilities", 0 ], [ "network", 0 ] ]
and restart.
Then we'll see what's what.
Sent v0.5.4
logs via email
I got your logs, thanks. Here's hoping there's a smoking gun in there. In any event, having these is very helpful in general so I can simulate different setups.
From the logs it seems the plugin thinks the Master Bedroom Overhead Fan has a downlight and the other two fans have no lights at all. Is that how your fans are equipped?
That is correct.
That's a relief. In the meantime I've identified the problem. It has to do with (presumably) my misinterpreting your bedroom fan's schedule. Maybe the others also but this is the first one I tripped over. I'm still looking into it to see exactly where it goes south.
I'm hopeful homebridge-i6-bigassfans@0.6.0-beta4
will work for you at least as well as v0.5.4
. The beta3
plugin would go into an infinite loop. On macOS Homebridge would crash fairly quickly with a message in the log file:
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
and hb-service
would restart it. But my simulated version of your fan wouldn't feed Homebridge without my intervention so my Homebridge instance wasn't itself in an infinite loop and I didn't have to reboot my machine.
So far so good with v0.6.0-beta4
and using direct IP address for the fans.
Edit: No luck with mDNS hostnames with v0.6.0-beta4
plus node v18.17.0
on Avahi. But as you said, we’re at least where we were with v0.5.4
! Seems likely a node / linux / something issue.
I'm hopeful
homebridge-i6-bigassfans@0.6.0-beta4
will work for you at least as well asv0.5.4
. Thebeta3
plugin would go into an infinite loop. On macOS Homebridge would crash fairly quickly with a message in the log file:
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
and
hb-service
would restart it. But my simulated version of your fan wouldn't feed Homebridge without my intervention so my Homebridge instance wasn't itself in an infinite loop and I didn't have to reboot my machine.
This seems very likely why I had to reboot. I’m using child bridges, but sounds like hb-service
wasn’t correctly detecting infinite loop. I probably had a stuck looping child bridge going the whole time until rebooting, which may somehow have interfered with homebridge.
Am glad we resolved the issue with beta3. Thanks. Can you try to run this test script under v18.17.0
and see what it shows? Save it to a file and make it executable. Here's what I see on my machine, first with an invalid port number, then the actual one.:
ber@rolo misc % ./connectTest.mjs BigFan.local 1234
node version: 18.17.0
connecting to BigFan.local 1234
connect ECONNREFUSED 192.168.7.214:1234
ber@rolo misc % ./connectTest.mjs BigFan.local 31415
node version: 18.17.0
connecting to BigFan.local 31415
connected okay, will now disconnect
ber@rolo misc %
#!/usr/bin/env node
import { argv } from 'node:process';
import net from 'node:net';
if (argv.length != 4) {
console.log(`Usage: ${argv[0]} <host> <port>`);
process.exit(1);
}
console.log(`node version: ${process.versions.node}`);
networkSetup(argv[3], argv[2]);
function networkSetup(port, host) {
console.log("connecting to " + host + " " + port);
const client = net.connect({port: port, host: host, family: 4}, () => {
console.log("connected okay, will disconnect");
client.end();
})
let errHandler;
client.on('error', errHandler = (err) => {
console.log(err.message);
})
client.on('data', (data) => {
console.log("got data");
})
}
(nodejs@saga)=(~)=($ ./connectTest.mjs MasterBedroom-XX:XX:XX.local 31415
node version: 18.17.0
connecting to MasterBedroom-XX:XX:XX.local 31415
getaddrinfo ENOTFOUND MasterBedroom-XX:XX:XX.local
(nodejs@saga)=(~)=($ ping MasterBedroom-XX:XX:XX.local
PING MasterBedroom-XX:XX:XX.local (172.16.1.237) 56(84) bytes of data.
64 bytes from MasterBedroom (172.16.1.237): icmp_seq=1 ttl=128 time=8.45 ms
Hmm, just testing a hunch here...
(nodejs@saga)=(~)=($ ./connectTest.mjs 172.16.1.237 31415
node version: 18.17.0
connecting to 172.16.1.237 31415
connected okay, will disconnect
(nodejs@saga)=(~)=($ ./connectTest.mjs Vali.local 22
node version: 18.17.0
connecting to Vali.local 22
connected okay, will disconnect
got data
What if NodeJS 18.17.0, or something between that and my Linux machine's Avahi/mDNS/whatever, no longer likes :
in a hostname?
That sounds like a solid hunch.
Doing some googling. Using :
in hostnames — even mDNS hostnames — seems controversial at best, and completely against spec at worst. We might just be up against Big Ass Fans not correctly setting mDNS hostnames.
Even Avahi on this Ubuntu can't be consistent with itself wrt the Big Ass Fans hostnames:
(nodejs@saga)=(~)=(% avahi-resolve-address 172.16.1.237
172.16.1.237 MasterBedroom
(nodejs@saga)=(~)=(% avahi-resolve-host-name MasterBedroom
Failed to create host name resolver: Invalid host name
(nodejs@saga)=(~)=(% ping MasterBedroom
ping: MasterBedroom: Temporary failure in name resolution
(nodejs@saga)=(~)=(% ping MasterBedroom-XX:XX:XX.local
PING MasterBedroom-XX:XX:XX.local (172.16.1.237) 56(84) bytes of data.
64 bytes from MasterBedroom (172.16.1.237): icmp_seq=1 ttl=128 time=5.87 ms
(nodejs@saga)=(~)=(% avahi-resolve-host-name MasterBedroom-XX:XX:XX.local
MasterBedroom-XX\058XX\058XX.local 172.16.1.237
At this point I've set fixed IPs for my fans anyway and am perfectly happy just letting this all go. It was a fun curiosity.
I don't know where that
Yes, it was fun. I learned a bit. Not enough, but a bit.
Yeah weirdly FanName.local
does not resolve for me anywhere. Not on my Macs or my Linux box. It presumably comes from the fans setting the address for themselves, but I'm not sure why yours and mine have differences there.
Maybe it's tied to the model. I have an i6
.
Seems very likely. Newer firmware, no longer setting out-of-spec mDNS names. Just funny they didn't bump that part of the software with the big update to move over to the new protocol and app.
As always, thanks for all you do for this plugin. Thanks for coming along on my probably-unnecessary adventures.
And I think(?) we accidentally caught what would’ve been a real bug with v0.6.0
before it made it out.
As always, my pleasure. And yes, tripping over the schedule bug was fortunate indeed.
Just to document in case people come across this thread:
I've moved to a Mac mini as my Homebridge server. It seems to be picking up both FanName
and fanname.haiku.home.
But FanName.local
still does not work. Wild!
(julian@Saga)=(~)=(% host 172.16.1.237
237.1.16.172.in-addr.arpa domain name pointer masterbedroom.haiku.home.
(julian@Saga)=(~)=(% ping masterbedroom
PING masterbedroom (172.16.1.237): 56 data bytes
64 bytes from 172.16.1.237: icmp_seq=0 ttl=128 time=3.127 ms
64 bytes from 172.16.1.237: icmp_seq=1 ttl=128 time=8.755 ms
(julian@Saga)=(~)=(% ping masterbedroom.haiku.home
PING masterbedroom.haiku.home (172.16.1.237): 56 data bytes
64 bytes from 172.16.1.237: icmp_seq=0 ttl=128 time=13.490 ms
64 bytes from 172.16.1.237: icmp_seq=1 ttl=128 time=3.150 ms
(julian@Saga)=(~)=(% ping MasterBedroom.local
ping: cannot resolve MasterBedroom.local: Unknown host
(julian@Saga)=(~)=(% sw_vers
ProductName: macOS
ProductVersion: 14.3
BuildVersion: 23D56
Describe The Bug:
I’ve been successfully using the
Entryway-XX:XX:XX.local
type format since, well, April 23 2022 presumably (#6).Today some combination of updates seems to have made that stop working. I repeatedly get:
Logs:
Plugin Config:
Environment:
Things I tried:
CocktailRoom-XX:XX:XX.local
and all the rest from the computer running Homebridge just fine.For now I’ve switched to resolved IP address, which works.