hap-java / HAP-Java

Java implementation of the HomeKit Accessory Protocol
MIT License
153 stars 82 forks source link

jmDNS issue? #133

Closed alundqvi closed 3 years ago

alundqvi commented 3 years ago

Using the HAP-Java as a simple Bridge. I can register in my Home app on iPhone as soon as the mDNS is visible.

First start often fails but after restart of the Bridge I can see the announcement on my Mac using sd-dns command. I register the Bridge and one switch I have registered. All fine and the Home App shows the Bridge and the Switch.

Now very soon after the registration the Phone/iPad indicates a failure, Can't connect to the Accessory. Then again connection is available. This is ongoing.

If I look at the sd-dns -B output (See below) I can see Add of the announcement. Then 1-3 minutes later rev so it is removed. Then again 1-3 minutes later it is Added again. And it is clear that during the period when it is Removed the service is not working but it starts again as soon as the next Add appears.

Is this seen by others? I am running it on a pi 4 Buster. I have stopped Avahi, as I believe it can interfere.

Output from sd-dns:

19:30:44.950 Add 2 17 local. _hap._tcp. MyNewestTestBridge 19:33:54.092 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 19:36:51.017 Add 2 17 local. _hap._tcp. MyNewestTestBridge 19:39:55.727 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 19:41:47.931 Add 2 17 local. _hap._tcp. MyNewestTestBridge 19:42:57.181 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 19:47:35.785 Add 2 17 local. _hap._tcp. MyNewestTestBridge

/Anders

yfre commented 3 years ago

i dont see behaviour with openhab, which uses Java-HAP. can you also see what gets published in jmDNS? Java-HAP should only re-publish on configuration or pairing/discoverability changes

alundqvi commented 3 years ago

If I log with -Z flag I see the following:

_hap._tcp PTR MyNewestTestBridge._hap._tcp MyNewestTestBridge._hap._tcp SRV 0 0 9123 lundqvistpi.local. ; Replace with unicast FQDN of target host MyNewestTestBridge._hap._tcp TXT "c#=1" "s#=1" "ff=0" "sf=0" "sh=lhjaKg==" "ci=1" "md=MyNewestTestBridge" "id=9a:9b:44:29:77:ee"

This is same what I see JavaHAP uses when creating the jmDNS.

Question is why... There are discussions on jmDNS on the use of TTL... And an old bug as well discussed where TTL seems to give issues??? But nothing you have seen?

yfre commented 3 years ago

looks good.
if c# and sf are no changing between re-connection then jmDNS re-publishing is not triggered by Java-HAP. can you enable more logging in Java-HAP and jmDNS packages?

alundqvi commented 3 years ago

Hi Again,

I recompiled it yesterday and added a few logger.info() in around the JmdnsHomekitAdvertiser class to see how and when HAP Java calls jmDNS. So initially I get the Registration:

Jan 23, 2021 3:13:53 PM io.github.hapjava.server.impl.jmdns.JmdnsHomekitAdvertiser setDiscoverable INFO: setDiscoverable() is called with: false on port 0 Jan 23, 2021 3:13:53 PM io.github.hapjava.server.impl.jmdns.JmdnsHomekitAdvertiser registerService INFO: Registering (false) _hap._tcp.local. on port 9123

Interesting what the registration for port 0 is?

Then I added a logger.info() in the setDiscoverable(). And I see that it is called quite a lot initially and as well every time I use the App turning on or off the light.... But as you say it will not trigger a call to jmDNS.

What is now happening however is that I get 4 Rmv/Add command once every Hour. Question is if I get a newer version of jmDNS.

But it looks like this: 11:34:39.821 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 11:34:39.857 Add 2 17 local. _hap._tcp. MyNewestTestBridge 11:34:40.296 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 11:34:40.370 Add 2 17 local. _hap._tcp. MyNewestTestBridge 12:29:39.991 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 12:29:40.092 Add 2 17 local. _hap._tcp. MyNewestTestBridge 12:29:40.484 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 12:29:40.605 Add 2 17 local. _hap._tcp. MyNewestTestBridge

-Z give at the same time

_hap._tcp PTR MyNewestTestBridge._hap._tcp MyNewestTestBridge._hap._tcp SRV 0 0 9123 lundqvistpi.local. ; Replace with unicast FQDN of target host MyNewestTestBridge._hap._tcp TXT "c#=1" "s#=1" "ff=0" "sf=0" "sh=lhjaKg==" "ci=1" "md=MyNewestTestBridge" "id=9a:9b:44:29:77:ee"

_hap._tcp PTR MyNewestTestBridge._hap._tcp MyNewestTestBridge._hap._tcp SRV 0 0 9123 lundqvistpi.local. ; Replace with unicast FQDN of target host MyNewestTestBridge._hap._tcp TXT "c#=1" "s#=1" "ff=0" "sf=0" "sh=lhjaKg==" "ci=1" "md=MyNewestTestBridge" "id=9a:9b:44:29:77:ee"

_hap._tcp PTR MyNewestTestBridge._hap._tcp MyNewestTestBridge._hap._tcp SRV 0 0 9123 lundqvistpi.local. ; Replace with unicast FQDN of target host MyNewestTestBridge._hap._tcp TXT "c#=1" "s#=1" "ff=0" "sf=0" "sh=lhjaKg==" "ci=1" "md=MyNewestTestBridge" "id=9a:9b:44:29:77:ee"

_hap._tcp PTR MyNewestTestBridge._hap._tcp MyNewestTestBridge._hap._tcp SRV 0 0 9123 lundqvistpi.local. ; Replace with unicast FQDN of target host MyNewestTestBridge._hap._tcp TXT "c#=1" "s#=1" "ff=0" "sf=0" "sh=lhjaKg==" "ci=1" "md=MyNewestTestBridge" "id=9a:9b:44:29:77:ee"

As you can see the latency from first Rmv to the last Add every hour is around 500-700 ms... So service works perfect??? And if I read the code right the TTL shall be 2 hours and after 50% it shall act unless the application have called to change the registration... so that seems tio be correct. Why it does this twice... not a clue. One hour during the night it actually did it once :-)

Strange I only added these logger.info() and recompiled.

On the extra logging: Happy to do that but I have never used SLF4J... I just mapped it with Maven to the standard JDK14 logger... But I have tried to add som logger.properties but without success. Can you give an old C++ programmer some advise here please :-).

/Anders

alundqvi commented 3 years ago

Hi again, now learnt to set the logging :-)...

Here is a Full log for a start up. There is 2 Add and 2 Rmv. Despite that the two files are from two different computers... the time stamps are synched.

I have tried to study the logs but I have not found out anything... I see no reason why the Rmv happens... The TTL is set to 3600... One hour but this happens far earlier just minutes after the start.

If you see anything it would be great.... Pity there are no simpler jmDNS only to manage one registration and to keep that alive... not managing all other items on the network... making the SW much bigger and use lots of more I/O.

HAP Java log.log HAP_Bonjour output.log

yfre commented 3 years ago

i just enabled jmDNS logging in my home network i was also shocked how much traffic mDNS actually creates. i have around 15 bonjour devices at home and apparently they exchange mDNS information non-stop. but it looks like mDNS is that chatty by design.

alundqvi commented 3 years ago

Hi Again,

Yes and hence it would have been great to have a Server only version of jmDNS... like in a Printer exposing only the registered Service. The jmDNS implements both the client and server side hence why it as well listens to all other services and can give information about the services in the local network. Would reduce the size and complexity... and reduce the chatting och the application.

Anyhow back to the issue. I still see the behaviour:

22:54:50.682 Add 2 17 local. _hap._tcp. MyNewestTestBridge 22:57:04.727 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 23:00:55.003 Add 2 17 local. _hap._tcp. MyNewestTestBridge 23:03:07.480 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 23:08:58.861 Add 2 17 local. _hap._tcp. MyNewestTestBridge

It should be once per hour based on the TTL for a registration... but as you see it is minutes between. If you have a Mac can you run the: dns-sd -B _hap._tcp .

To see how your Server acts? I believe there is a Java jar available for the dns-sd running on any machine I guess. The issue is however very clear when Rmv is received my iPhone/iPad can't see the Bridge. As soon as Add comes it is working again.

Issue is in jmDNS for sure, but why don't you see it?

HW, Unix version?

I read in jmDNS that they discussed threads and issues around these... but in the log I can't see anything strange... except the flood of events...

Any idea?

/Anders

yfre commented 3 years ago

Hi Anders,

i just tried my mac and cannot re-produce it. i have 2 OpenHab instance, which are using this Java-HAP library. One instance is running on something similar to raspberry pi and debian, second instance is running on mac

here is the result for 1 hour. i saw Rmv only after i have stopped local openhab on mac

dns-sd -B _hap._tcp Browsing for _hap._tcp DATE: ---Thu 28 Jan 2021--- 12:51:24.606 ...STARTING... Timestamp A/R Flags if Domain Service Type Instance Name 12:51:24.606 Add 3 4 local. _hap._tcp. GARDENA smart Gateway 12:51:24.606 Add 3 4 local. _hap._tcp. Netatmo-Doorbell 12:51:24.606 Add 3 4 local. _hap._tcp. Netatmo Presence 12:51:24.606 Add 3 4 local. _hap._tcp. Weather Station 12:51:24.606 Add 3 4 local. _hap._tcp. Netatmo Welcome 12:51:24.606 Add 3 4 local. _hap._tcp. Aqara Hub-36A0 12:51:24.606 Add 2 4 local. _hap._tcp. openHAB 12:51:59.174 Add 2 4 local. _hap._tcp. openHAB (2) 13:57:22.805 Rmv 0 4 local. _hap._tcp. openHAB (2)

alundqvi commented 3 years ago

Hi again,

I have tried to start on both my Mac. And as well on a PC Win 10 I have... with the Mac I have an issue as it is my working laptop and my company have a Firewall not allowing it to run fully... But it seems from the registration point of vie work the same as the PC.

Here I see this

17:04:39.525 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.563 Add 2 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.928 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.999 Add 2 17 local. _hap._tcp. MyNewestTestBridge

Every hour... it does Rmv-Add,Rmw and Add... in approx 500 ms... I did get a similar run with the Raspeberry but after a few hours it returned to Rmv and Add as it liked.

Strange that your openHAB instances don't behave like that. Wonder what the difference is.

On the Mac I use SDK 14.0.1... On PC it is: java version "1.8.0_271"... I believe it is SDK 14.0.2. On the Raspberry I have SDK 11... What are you running on your Debian Server?... If the Java version is the issue?

/Anders

ccutrer commented 3 years ago

doesn't OpenHAB share its jmDNS instance from other things, so possibly it's configured differently?

alundqvi commented 3 years ago

OK.. You mean that jmDNS is running as its own process? I do see it has its own main()...

alundqvi commented 3 years ago

Still strange that it works that different on my PC/Mac vs my Raspberry PI... Same jar file. Tested with the 1.8 JRE on the PI but the same result...

ccutrer commented 3 years ago

No, not a separate process. But a single jmDNS object (and its associated state and listening sockets, etc.) is shared to advertise HomeKit and http and any other services a single openHAB instance exposes

yfre commented 3 years ago

my dns-sd was running all day today and i saw Rmv/Add in different intervals, sometimes just one minute apart, sometime 2 hours. but what is interesting, multiple homekit devices, including real hardware devices, did add/Rmv at same time.

i have no idea how mDNS works, maybe it is normal. but i have not experienced any issues with home app. i even use home app automation to switch on lights if sensor attached to aqara hub detects presence. all devices are all time available and automation works.

17:06:35.708 Add 2 4 local. _hap._tcp. GARDENA smart Gateway 17:06:35.788 Add 2 4 local. _hap._tcp. Aqara Hub-1489 17:06:35.789 Add 2 4 local. _hap._tcp. openHAB 17:06:35.927 Add 2 4 local. _hap._tcp. Netatmo Presence 17:06:35.932 Add 2 4 local. _hap._tcp. Netatmo-Doorbell 17:06:36.039 Add 2 4 local. _hap._tcp. Weather Station 17:07:21.276 Rmv 1 4 local. _hap._tcp. GARDENA smart Gateway 17:07:21.276 Rmv 1 4 local. _hap._tcp. openHAB 17:07:21.276 Rmv 1 4 local. _hap._tcp. Netatmo Presence 17:07:21.276 Rmv 1 4 local. _hap._tcp. Netatmo-Doorbell 17:07:21.276 Rmv 1 4 local. _hap._tcp. Weather Station 17:07:21.276 Rmv 0 4 local. _hap._tcp. Aqara Hub-1489 17:23:57.680 Add 2 4 local. _hap._tcp. openHAB 17:23:57.695 Add 2 4 local. _hap._tcp. Aqara Hub-1489 17:23:57.718 Add 2 4 local. _hap._tcp. GARDENA smart Gateway 17:23:57.869 Add 2 4 local. _hap._tcp. Netatmo Presence 17:23:57.884 Add 2 4 local. _hap._tcp. Weather Station 17:24:43.312 Rmv 1 4 local. _hap._tcp. openHAB 17:24:43.312 Rmv 1 4 local. _hap._tcp. GARDENA smart Gateway 17:24:43.312 Rmv 1 4 local. _hap._tcp. Netatmo Presence 17:24:43.312 Rmv 1 4 local. _hap._tcp. Weather Station 17:24:43.312 Rmv 0 4 local. _hap._tcp. Aqara Hub-1489 17:34:38.735 Add 3 4 local. _hap._tcp. GARDENA smart Gateway 17:34:38.735 Add 2 4 local. _hap._tcp. openHAB 17:34:38.754 Add 2 4 local. _hap._tcp. Netatmo-Doorbell 17:34:38.867 Add 3 4 local. _hap._tcp. Aqara Hub-1489 17:34:38.867 Add 2 4 local. _hap._tcp. Weather Station 17:34:38.952 Add 2 4 local. _hap._tcp. Netatmo Presence 17:34:40.179 Add 2 4 local. _hap._tcp. Netatmo Welcome 17:37:42.492 Rmv 1 4 local. _hap._tcp. GARDENA smart Gateway 17:37:42.493 Rmv 1 4 local. _hap._tcp. Weather Station 17:37:42.493 Rmv 1 4 local. _hap._tcp. Netatmo Presence 17:37:42.493 Rmv 1 4 local. _hap._tcp. Netatmo Welcome 17:37:42.493 Rmv 1 4 local. _hap._tcp. Netatmo-Doorbell 17:37:42.493 Rmv 1 4 local. _hap._tcp. openHAB 17:37:42.493 Rmv 0 4 local. _hap._tcp. Aqara Hub-1489 ....

alundqvi commented 3 years ago

Hi Eugen,

Well I have today used another older PI with the same OS version but PI3. Using it as an Airplay device.

So having my PC and this one running in parallel with the PI I started with shows different way of working.

On the Old PI and the PC I get normally an update every hour. I always get this Rmv-Add-Rmv-Add sequence. As you say they are happening at the same time... so Something like the receiving side seems to ask for the update...

But the PI I intended to use... It seems to get very odd short Add then Rmv the add... but just a few minutes apart.

I use the same Java on both PI. Same OS Buster. Not the same Kernel though. Second difference is that the new PI runs several other SW and this is not the case for my second PI. So either it is the kernel or another SW that acts weird... strange really strange.

I have as well found this mystic 65 that is not known to jmDNS... but that I can patch myself... but seems not to have an impact.

I have tried with both a running Avahi on teh PI and a stopped one... but no difference.

17:04:07.624 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (3) 17:04:39.410 Rmv 1 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.410 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (2) 17:04:39.441 Add 2 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.513 Add 2 17 local. _hap._tcp. MyNewestTestBridge (2) 17:04:39.977 Rmv 1 17 local. _hap._tcp. MyNewestTestBridge 17:04:39.977 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (2) 17:04:40.115 Add 3 17 local. _hap._tcp. MyNewestTestBridge (2) 17:04:40.115 Add 2 17 local. _hap._tcp. MyNewestTestBridge 17:06:53.216 Add 2 17 local. _hap._tcp. MyNewestTestBridge (3) 17:07:50.755 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (3) 17:09:21.191 Add 2 17 local. _hap._tcp. MyNewestTestBridge (3) 17:10:32.923 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (3) 17:13:26.667 Add 2 17 local. _hap._tcp. MyNewestTestBridge (3) 17:14:01.072 Rmv 0 17 local. _hap._tcp. MyNewestTestBridge (3) 17:15:22.484 Add 2 17 local. _hap._tcp. MyNewestTestBridge (3)

So you see the (3) is the PI that don't work. (2) is the PI that works and the one without number is my PC. And you can see that the (2) and the PC is triggered simultaneously but they are started at very different time.

So summary I guess I have to abandon to use the PI 4 for this and move it to a new PI... hopefully the issue is not the Kernel but some other SW running on it...

/Anders

alundqvi commented 3 years ago

One more...

I looked on your output...

17:24:43.312 Rmv 1 4 local. _hap._tcp. GARDENA smart Gateway 17:24:43.312 Rmv 1 4 local. _hap._tcp. Netatmo Presence 17:24:43.312 Rmv 1 4 local. _hap._tcp. Weather Station 17:24:43.312 Rmv 0 4 local. _hap._tcp. Aqara Hub-1489 17:34:38.735 Add 3 4 local. _hap._tcp. GARDENA smart Gateway

Look at the GARDENA ... Rmv at 17:24:43.312 and then Add at 17:34:38.735... like 10 minutes without availability... this is the same issue I see.

As well for the OpnHAB 17:24:43.312 Rmv 1 4 local. _hap._tcp. openHAB ... 17:34:38.735 Add 2 4 local. _hap._tcp. openHAB

Interesting... and you don't see any issue? I use the Apple Home App and as soon as I see Rmv it can't see the Bridge anymore. What apps are you using? Because this might be an issue with the Apple App as well?!?

yfre commented 3 years ago

maybe i was just lucky. but im using home app a lot, i control everything from home app and i even have devices that can be controlled only from home app. i dont recall any recent cases when i saw "no response" in home app.

alundqvi commented 3 years ago

Hi Eugen,

no there is no luck here. It is just lack of my skills. So there is something else then. I guess you run iOS14 if now Apple have done something. Maybe there is some settings as well? Maybe a timer allowing a device to get Rmv but still be used. For me it is getting unavailable at the very moment or not more than 1-2 seconds after I see Rmv on my laptop.

Another strange bahaviour I see with the initial PI is at start it can take minutes until the first Add. For the other two the Add is received fairly immediately at start of the hap-java program.

I have run my three instances over night. In my Home App. The two is working but the third is unavailable now and then. This morning. I will look at the printouts later today.

In any case a miracle :-)

alundqvi commented 3 years ago

Hi Eugen,

new info.. I have had my hub an Apple TV 4 added as an Accessory. I even had No Response from it now and then. I removed it and very interesting my third bridge with the issues got connected and has not dropped since now for some 20 minutes. So maybe I have been chasing a ghost beeing an Apple bug. Will verify this later tomorrow or similar. I as well need to see if he Rmv Add sequence changed.

alundqvi commented 3 years ago

Hi Again to close this down. I have installed a new PI 4 with the same OS etc... It is working without any issues. I believe that it is PI Hole, it is the only SW running on the non working PI server. I can't see any other reasons. It is clearly not part of HAP Java so we can close this issue.