sony / nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
Apache License 2.0
138 stars 80 forks source link

test-nmos-is0405 fails IS-04 test 10 when nmos-cpp-node uses avahi #10

Open billt-hlit opened 6 years ago

billt-hlit commented 6 years ago

Hi, everyone,

I've been doing development guided by the test-nmos-is0405 script Riedel has shared, and the trunk (as of 91750507f34) fails IS-04 test 10, "Node advertises a Node type mDNS announcement with no ver_* TXT records in the presence of a Registration API", at least when test-nmos-is0405 is built using avahi. The error message from the test is "No matching mdns announcement found for node."

I've looked at it with the DNS-SD browser provided by avahi, and the records are very strange. nmos-cppnode is displayed in the browser, but when I select it, the only thing the browser eventually shows is a "timeout reached" message. By comparison, the RDS shows a DNS record with a bunch of information, including a couple of TXT records.

Has anyone else run these tests against nmos-cpp-node? Do you see the same behavior, and are you running avahi or mdnsbridge?

Thanks, Bill

garethsb commented 6 years ago

Thanks for the report - coincidently we had scheduled to look at the test-nmos-is0405 scripts this week, so it's good timing.

How have you built/run nmos-cpp-node - with dns_sd.h and libdns_sd.so from mdnsResponder with its mdnsd daemon or the compatibility versions from libavahi-compat-libdnssd?

nmos-cpp_node_ip-address:port is displayed in the browser

Well, I guess that's a start, that is the default naming style used by nmos-cpp-node for its service instance.

but when I select it, the only thing the browser eventually shows is a "timeout reached" message.

That sounds like an Avahi/mdnsResponder interop issue, but we will investigate here.

By comparison, the RDS shows a DNS record with a bunch of information, including a couple of TXT records.

Are you using nmos-cpp-registry, the BBC registry or another one?

Thanks.

billt-hlit commented 6 years ago

I'm using libavahi's compatibility library. I'm using the BBC registry VM image that they published early this year -- I tried the vagrant image but kept being stymied fairly late in the boot-up process. I haven't tried it against nmos-cpp-registry lately.

Edit: To be a bit more precise, I'm using the libavahi-compat-libdnssd-dev package under ubuntu 16.04.4.

garethsb commented 6 years ago

Hi Bill,

We've just tried the tests on nmos-cpp-node with nmos-cpp-registry, under Ubuntu 14.04 (64-bit), first built with mdnsResponder and mdnsd, and then built with libavahi-compat-libdnssd-dev and avahi-daemon. All the automatic IS-04 tests pass, including test 10.

I would believe that using a different registry would cause different results for some tests, but hopefully not for test 10. I would also hope that the difference is not Ubuntu 14.04 vs. 16.04 or 64-bit vs. 32-bit. There must be something else different in the setup that is causing failures for you, but I'm not sure what to suggest investigating next.

billt-hlit commented 6 years ago

My machine is 64-bit as well.

In any event, I'm still in the duct tape and bailing wire stage, so I don't need to pursue this test any further in my environment.

Thanks for looking into it.

garethsb commented 6 years ago

Hi Bill,

I'm happy to keep this open, as I am tracking differences/interop issues between the various DNS-SD implementations on each platform.

One thing: can you tell me what the host-name of your machine is, and whether you are specifying it explicitly when launching nmos-cpp-node (like nmos-cpp-node "{\"host_name\":\"foo\"}") or not?

Thanks!

billt-hlit commented 6 years ago

Hi, Gareth,

The machine's hostname is "hobbes", but I'm setting it to "D13_04204" (or somesuch) on the command line.

Also, I realized I have a 32-bit chroot lying around. I rebuilt nmos-node-cpp using it and still see the problem. So, it's not a 32-bit node program (although avahi-daemon is still 64-bit, of course).

garethsb commented 6 years ago

The machine's hostname is "hobbes", but I'm setting it to "D13_04204" (or somesuch) on the command line.

Is it possible you could try it without specifying the host name?

billt-hlit commented 6 years ago

That fixes it! I'm only setting the host name for testing purposes anyhow, so this problem shouldn't impact my work in the long term.

Thanks!

rhastie commented 6 years ago

I've finally managed to run these tests based on our compilation - I can confirm that using the container version of nmos-cpp compiled with Avahi works fine against the Riedel tests. Specifically test 10 passed correctly.

garethsb commented 6 years ago

That fixes it! I'm only setting the host name for testing purposes anyhow, so this problem shouldn't impact my work in the long term.

In this case the host name also needs a name record. It might work putting your test name in your /etc/hosts file. I think we have used this configuration to override the host name for Mininet virtual hosts.

garethsb commented 6 years ago

With two recent commits, 600bf053aeb028b716d424943babb07eabf4ced5 and a29d232c7bbe30b22ce95493353a35ef04289f6f, nmos-cpp will now attempt to register the .local address record(s) when a hostname is specified explicitly on the command line. This works when built with when built with mDNSResponder/Bonjour, but the Avahi compatibility layer unfortunately doesn't implement this functionality. However, using avahi-publish -a before starting the node should work in that case.

I.e., @billt-hlit

The machine's hostname is "hobbes", but I'm setting it to "D13_04204" (or somesuch) on the command line.

Try something like this...

avahi-publish -a -R D13_04204.local `hostname -I` &
nmos-cpp-node "{\"host_name\": \"D13_04204\"}"

Using /etc/avahi/hosts may also work...

garethsb commented 6 years ago

Also, when using Avahi, in order that .local addresses can be looked up by nmos-cpp nodes, it's important that the Name Service Switch uses Avahi to resolve .local names. The magic in /etc/nsswitch.conf is to have a hosts stanza like so:

hosts:          files mdns4_minimal [NOTFOUND=return] dns
tassianors commented 5 years ago

Hi, I'm facing a problem that I think is related to this topic.

I'm using an nmos register instance one one machine and I have a node one a different device that is trying to register things on the register api of the first device. I'm using avahi layer so far.

the problem is that I'm able to see and resolve the advertisement of the registration device:

avahi-browse -r _nmos-query._tcp

avahi-browse -r _nmos-register._tcp

but when I instantiate the node application, it complains that it could not find any IP address for the specified hostname.

1970-01-01 00:33:02.124: more info: 1841243216: After DNSServiceBrowse, DNSServiceBrowseReply got service: nmos-cpp_registration_10-232-21-63:3210 for regtype: _nmos-register._tcp. domain: local. on interface: 2 1970-01-01 00:33:02.126: more info: 1874814032: DNSServiceResolve for name: nmos-cpp_registration_10-232-21-63:3210 regtype: _nmos-register._tcp. domain: local. on interface: 2 1970-01-01 00:33:02.128: too much info: 1891599440: After DNSServiceBrowse, DNSServiceProcessResult succeeded 1970-01-01 00:33:02.128: more info: 1891599440: After DNSServiceBrowse, DNSServiceBrowseReply got service: nmos-cpp_registration_10-232-21-63:3210 for regtype: _nmos-registration._tcp. domain: local. on interface: 2 1970-01-01 00:33:02.129: more info: 1908384848: DNSServiceResolve for name: nmos-cpp_registration_10-232-21-63:3210 regtype: _nmos-registration._tcp. domain: local. on interface: 2 1970-01-01 00:33:02.140: too much info: 1874814032: After DNSServiceResolve, DNSServiceProcessResult succeeded 1970-01-01 00:33:02.140: too much info: 1874814032: After DNSServiceResolve, DNSServiceProcessResult succeeded 1970-01-01 00:33:02.152: too much info: 1908384848: After DNSServiceResolve, DNSServiceProcessResult succeeded 1970-01-01 00:33:02.587: too much info: 1874814032: After DNSServiceResolve, DNSServiceProcessResult succeeded 1970-01-01 00:33:02.588: more info: 1874814032: After DNSServiceResolve, DNSServiceResolveReply got host: ultritouch.local. port: 3210 1970-01-01 00:33:02.588: more info: 1874814032: getaddrinfo for hostname: ultritouch.local. 1970-01-01 00:33:02.591: more info: 1874814032: Using getaddrinfo, got no addresses for host: ultritouch.local.

it looks like the application is trying to resolve the hostname in order to find the IP address. I'm wondering why does it need to resolve the hostname for a second time if the IP address is something that is already provided by the mdns when the service is resolved.

thanks for your time on this.

Tassiano

garethsb commented 5 years ago

it looks like the application is trying to resolve the hostname in order to find the IP address. I'm wondering why does it need to resolve the hostname for a second time if the IP address is something that is already provided by the mdns when the service is resolved.

Although avahi-browse shows the resolve step as returning hostname and IP address at the same time, it's actually two different DNS records - and two different API calls in the dns_sh.h API, DNSServiceResolve and DNSServiceGetAddrInfo. Unfortunately the Avahi implementation doesn't support the second of those, so we use their recommended workaround which is the system call getaddrinfo, instead. That's what we see in your log.

Do you have the /etc/nsswitch.conf magic mentioned above in https://github.com/sony/nmos-cpp/issues/10#issuecomment-404886096?

tassianors commented 5 years ago

Do you have the /etc/nsswitch.conf magic mentioned above in #10 (comment)?

yes.

cat /etc/nsswitch.conf

passwd: files group: files shadow: files

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 networks: files dns

protocols: files services: files ethers: files rpc: files

if instead of using avahi layer I use the mdnsresolver, the step of resolving the hostname using //getaddrinfo// would be bypassed and the use of the //DNSServiceGetAddrInfo// would be used instead? meaning that the hostname resolution would not be required at this point.

garethsb commented 5 years ago

if instead of using avahi layer I use the mdnsresolver, the step of resolving the hostname using //getaddrinfo// would be bypassed and the use of the //DNSServiceGetAddrInfo// would be used instead?

Yes.

tassianors commented 5 years ago

I will give it a try and let you know the results. I appreciate the help!

tassianors commented 5 years ago

I've replaced avahi by the mdnsd with the patches available inside this repo, but now I'm on a situation where the application dont seems to even try to find the registration:

1970-01-01 00:08:39.770: info: 1996133104: Starting nmos-cpp node 1970-01-01 00:08:39.774: info: 1996133104: Process ID: 1165 1970-01-01 00:08:39.774: info: 1996133104: Initial settings: {"connection_port":8856,"host_address":"192.168.12.30","host_addresses":["192.168.12.30","192.168.0.100","11.0.0.1","22.0.0.1"],"log_level":-40,"node_port":12345,"seed_id":"d1b96a18-4dd1-4c02-bd79-a0f68de37b6c"} 1970-01-01 00:08:39.774: info: 1996133104: Configuring nmos-cpp node with its primary Node API at: 192.168.12.30:12345 1970-01-01 00:08:39.847: info: 1963975760: NmosIS05ActivatorClient initialized. ret 0 1970-01-01 00:08:39.848: info: 1963975760: NODE IS05 thread waiting 1970-01-01 00:08:39.855: info: 1996133104: Preparing for connections

is there anything else that I'm missing, maybe on the configuration file?

thanks!

tassianors commented 5 years ago

Further info:

Started        0:04:14.268016
End            0:04:18.585656
Captured for   0:00:04.317640
Unique source addresses seen on network: 1 (IPv4)

Modern Query        Packets:          382   (avg 5309/min)
Legacy Query        Packets:            0   (avg    0/min)
Multicast Response  Packets:            0   (avg    0/min)
Total     Multicast Packets:          382   (avg 5309/min)

Total New Service Probes:             729   (avg10132/min)
Total Goodbye Announcements:            0   (avg    0/min)
Total Query Questions:                789   (avg10965/min)
Total Queries from Legacy Clients:      0   (avg    0/min)
Total Answers/Announcements:            0   (avg    0/min)
Total Additional Records:               0   (avg    0/min)

Service Type             Total Ops   Probe   Goodbye  BrowseQ  BrowseA ResolveQ ResolveA
_rtsp._tcp.                   460        0        0        0        0      460        0
_http._tcp.                   106        0        0       14        0       92        0
_nmos-register._tcp.           69        0        0       23        0       46        0
_nmos-registration._tcp.       54        0        0        8        0       46        0
_nmos-query._tcp.              30        0        0        7        0       23        0
_sftp-ssh._tcp.                25        8        0        0        0       17        0
_ssh._tcp.                      8        8        0        0        0        0        0
_nmos-node._tcp.                8        8        0        0        0        0        0
_smb._tcp.                      7        0        0        0        0        7        0
_dns-sd._udp.                   7        0        0        7        0        0        0

Source Address           Total Ops   Probe   Goodbye  BrowseQ  BrowseA ResolveQ ResolveA    Pkts    Query   LegacyQ Response
192.168.12.1                  774       24        0       59        0      691        0      382      382        0        0
buildroot-13.local.                           ARMV7L         LINUX
mDNSPosix.c:mDNSPlatformSetAllowSleep(): NOT IMPLEMENTED!
mDNSPosix.c:mDNSPlatformSetAllowSleep(): NOT IMPLEMENTED!
mDNSPosix.c:mDNSPlatformSetAllowSleep(): NOT IMPLEMENTED!
mDNSPosix.c:mDNSPlatformSetAllowSleep(): NOT IMPLEMENTED!
mDNS_FinalExit: RR Cache was using 0 records, 0 active

its seems that the MDNS is working although the traffic seems quite intense.

thanks

tassianors commented 5 years ago

I've replaced avahi by the mdnsd with the patches available inside this repo, but now I'm on a situation where the application dont seems to even try to find the registration:

1970-01-01 00:08:39.770: info: 1996133104: Starting nmos-cpp node 1970-01-01 00:08:39.774: info: 1996133104: Process ID: 1165 1970-01-01 00:08:39.774: info: 1996133104: Initial settings: {"connection_port":8856,"host_address":"192.168.12.30","host_addresses":["192.168.12.30","192.168.0.100","11.0.0.1","22.0.0.1"],"log_level":-40,"node_port":12345,"seed_id":"d1b96a18-4dd1-4c02-bd79-a0f68de37b6c"} 1970-01-01 00:08:39.774: info: 1996133104: Configuring nmos-cpp node with its primary Node API at: 192.168.12.30:12345 1970-01-01 00:08:39.847: info: 1963975760: NmosIS05ActivatorClient initialized. ret 0 1970-01-01 00:08:39.848: info: 1963975760: NODE IS05 thread waiting 1970-01-01 00:08:39.855: info: 1996133104: Preparing for connections

is there anything else that I'm missing, maybe on the configuration file?

thanks!

My bad. For some reason I've reverted the /etc/nsswitch.conf; now with the line: hosts: files mdns4_minimal [NOTFOUND=return] dns

the system seems to attempt to locate the service, but it can't still:

1970-01-01 00:02:04.920: info: 1996010224: Starting nmos-cpp node 1970-01-01 00:02:04.924: info: 1996010224: Process ID: 595 1970-01-01 00:02:04.925: info: 1996010224: Initial settings: {"connection_port":8856,"host_address":"192.168.12.30","host_addresses":["192.168.12.30","192.168.0.100","11.0.0.1","22.0.0.1"],"log_level":-40,"node_port":12345,"seed_id":"fdba1e45-9fce-464f-9e71-d40846873a71"} 1970-01-01 00:02:04.925: info: 1996010224: Configuring nmos-cpp node with its primary Node API at: 192.168.12.30:12345 1970-01-01 00:02:04.998: info: 1963975760: NmosIS05ActivatorClient initialized. ret 0 1970-01-01 00:02:04.998: info: 1963975760: NODE IS05 thread waiting 1970-01-01 00:02:05.008: info: 1996010224: Preparing for connections 1970-01-01 00:02:05.017: info: 1996010224: Ready for connections 1970-01-01 00:02:05.065: info: 1766769744: Registered advertisement for: nmos-cpp_node_192-168-12-30:12345._nmos-node._tcp 1970-01-01 00:02:05.077: info: 1573737552: Attempting discovery of a Registration API 1970-01-01 00:02:06.078: warning: 1573737552: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:02:06.079: info: 1573737552: Adopting peer-to-peer operation 1970-01-01 00:02:07.080: info: 1674450000: Attempting discovery of a Registration API 1970-01-01 00:02:37.102: warning: 1674450000: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:02:38.102: info: 1682842704: Attempting discovery of a Registration API 1970-01-01 00:03:08.130: warning: 1682842704: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:03:09.131: info: 1901053008: Attempting discovery of a Registration API 1970-01-01 00:03:39.160: warning: 1901053008: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:03:40.161: info: 1582130256: Attempting discovery of a Registration API 1970-01-01 00:04:10.190: warning: 1582130256: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:04:11.191: info: 1859089488: Attempting discovery of a Registration API 1970-01-01 00:04:41.221: warning: 1859089488: Did not discover a suitable Registration API via DNS-SD 1970-01-01 00:04:42.222: info: 1733198928: Attempting discovery of a Registration API

tassianors commented 5 years ago

Sorry for the noise here guys, I found the source of my problem: my manual installation of the mdnsresolver was missing one part and in one of my tests the libdns_sd.so used on the target was correct (generated from the mdns repo), but the .so used on the compilation phase of my node was pointing to the one generated by avahi.

now everything seems to work as expected. thanks!!

garethsb commented 5 years ago

That's very good news. I've had the problem of a confused avahi-mdnsResponder system myself. Glad you got it working!