machinekit / machinekit-hal

Universal framework for machine control based on Hardware Abstraction Layer principle
https://www.machinekit.io
Other
108 stars 62 forks source link

Machinetalk: published dns-sd names inconsistent #170

Open dkhughes opened 5 years ago

dkhughes commented 5 years ago

The new naming scheme for published services from machinekit components are confusing and contain spaces in the dns record. This caused the tools I wrote previously to fail to enumerate the services since they scrape spaces.

For instance, the launcher command service was previously named:

_launchercmd._sub._machinekit._tcp.local.

Cumbersome, but it was easy to validate that the SRV and TXT records were correct using a tool like nslookup or dns-sd. Changes to this have now made service names something like the following:

Machinekit Launcher on machinekit.local._machinekit._tcp.local.

The tools barf on the spaces. I'm a novice with Zeroconf stuff, but shouldn't the services be published as

<ServiceName>.<ServiceType>.<Domain-Name>
e.g.,
launchercmd._machinekit._tcp.myhostname.local

and the larger space included name ("Machinekit launcher on...") be in a field in the TXT record, maybe a description value?

Also, both the old method and new method resolve service names in the local domain with a subdomain of _tcp. Should we publish the hostname in the domain portion as well to allow for easy understanding of the host providing the service directly after the first multicast query?

cerna commented 5 years ago

I am currently building something like library for connection to Machinetalk services and from mDNS/DNS-SD/Zeroconf point of view I don't think that it is problematic - or better yet it's not against standard.

Hovewer you can still query for PTR record with name of _log._sub._machinekit._tcp.local or _machinekit._tcp.local and you will get as an answer PTR record with Domain Name of Log service on machinekit64.local pid 739._machinekit._tcp.local, so it still works the same.

(Every Machinekit machinetalk service is announced by two PTR records, one standard _machinekit._tcp.local, which will get you all Machinekit Machinetalk services, and one using the subservice pattern, like _log._sub._machinekit._tcp.local, which will get you all Machinekit Machinetalk Log Services on local network.)

Maybe it's only different terminology, but I would not call Machinekit Launcher on machinekit.local._machinekit._tcp.local. as a Service Name, but a Domain Name. (Probably depending on what library/tool you are using for mDNS/DNS-SD.)

Where it could make a problem is that you need this Domain Name from PTR record to query for SRV Record.

Not sure what you mean by nslookup or dns-sd. But if you mean the command line tools in Windows 10, then you can use the double quotes systax like this: dns-sd -L "Log service on mechan64.local pid 739" _machinekit._tcp, which will work even is spaces in Domain Name.

And I don't understand your meaning of last paragraph. You get the hostname by quering for SRV Record of the Domain Name which you got as result of PTR Record query. The _tcp in for example _log._sub._machinekit._tcp.local is pretty much useless, because of historical reasons it could be either _tcp for protocol which communicate somehow over TCP or _udp for every other form of communication.

dkhughes commented 5 years ago

"(Every Machinekit machinetalk service is announced by two PTR records"

The issue is that older underscore style record isn't being published from the latest build on the hardware I'm using for the launcher services. This makes the older tool I wrote not able to resolve and hence it can't start. Basically, there is no backwards compatible entry with the way the service announcement worked in the past. Not a big deal, but it just means I have to revisit older software again to get a system running. As a workaround I temporarily put the old listings back in mklauncher, but the exercise made me want to ask someone who knows more.

"I would not call Machinekit Launcher on machinekit.local._machinekit._tcp.local. as a Service Name, but a Domain Name."

I agree, I think of it the same way since they are all DNS entries. After looking into the dns-sd stuff more I see it is common to include spaces in the resolution. I'll adjust my tools accordingly. Thanks for the -L switch on dns-sd, I didn't know about that one.

Your post prompted me to look much more into this and I've found that, while my original ideas were along the wrong grain, I think the current method of publishing could still be polished slightly. I've gleaned some info from this reference posted by IBM (https://www.ibm.com/support/knowledgecenter/en/SSB2MG_4.6.0/com.ibm.ips.doc/concepts/gx_gv_bonjour_service_discovery.htm).

So only improvement would be:

Why not use the host UUID instead of the hostname and PID in the PTR DNS record? Considering there is already a UUID required in the machinekit conf file (autogenerated on first boot with the arm images), why not use something like Machinekit Launcher [UUID of host]._machinekit._tcp.local instead? This reduces the verbosity slightly, and gives us a unique DNS record to resolve even in a busy domain with lots of machinekit instances running, right? Assuming the UUID is really unique (which is fair), then there is much less possibility collisions versus hostname + PID, especially with these autogenerated ARMHF images where the hostname is prepopulated at first boot. Host UUID should also be sufficient since there is only one instance of the HAL services running per host simultaneously (that hasn't changed has it?).

cerna commented 5 years ago

I am not sure we understand each other fully and the fact is I don't know how it was implemented in the old good days as I started studying and playing with it only couple of months back. So take my word with grain of salt.

Hovewer when I look at the Launcher service startup from the point of mDNS/DNS-SD sniffer to catch the Gratious announcement message (which should all nice and good Zeroconf implementations send) I actually see PTR record named _launchercmd._sub._machinekit._tcp.local, for example I see:

(...)
Canonical Name: _launcher._sub._machinekit._tcp.local
Name: _launcher._sub._machinekit._tcp.local
Type: PTR
TTL: 01:15:00
        Domain Name: Machinekit Launcher on mechan64.local._machinekit._tcp.local
Canonical Name: _machinekit._tcp.local
Name: _machinekit._tcp.local
Type: PTR
TTL: 01:15:00
        Domain Name: Machinekit Launcher on mechan64.local._machinekit._tcp.local
Canonical Name: machinekit launcher on mechan64.local._machinekit._tcp.local
Name: Machinekit Launcher on mechan64.local._machinekit._tcp.local
Type: SRV
TTL: 00:02:00
        Port: 40591, Priority: 0, Target: mechan64.local, Weight: 0
Canonical Name: machinekit launcher on mechan64.local._machinekit._tcp.local
Name: Machinekit Launcher on mechan64.local._machinekit._tcp.local
Type: TXT
TTL: 01:15:00
        TXT: dsn=tcp://mechan64.local:40591
        TXT: uuid=a42c8c6b-4025-4f83-ba28-dad21114744a
        TXT: instance=e2f61884-0dfa-11e9-9ce8-001a4d8094af
        TXT: service=launcher
Canonical Name: mechan64.local
Name: mechan64.local
Type: A
TTL: 00:02:00
        IPv4 Address: 192.168.88.34
(...)

And when I send out querry for PTR record with the Name/Canonical Name of _launcher._sub._machinekit._tcp.local I get as an answer the corrent Domain Name. Or you mean that the Domain name used to be something like _launchercmd._sub._machinekit._tcp.local on machinekit64.local._machinekit._tcp.local?

I don't think that change in domain name should be able to cause break of code. Generally speaking you should have all DNS records per service, so (two) PTRs, (one) SRV, (one) TXT and (arbitrary) A/AAAAs. The Domain Name returned with PTR record should not be important enought to warrant specific consideration from programming point of view.

And I don't think that changing the way Domain Names are currently joined together should be a problem. Will not be for me. Maybe if somebody is somehow using to parse hostname to avoid the SRV record (but why?). So go for it and change it.

dkhughes commented 5 years ago

"Or you mean that the Domain name used to be something like _launchercmd._sub._machinekit._tcp.local"

Yes, exactly. Previously, the SRV and TXT records were enumerable using the domain name _launchercmd._sub._machinekit._tcp.local. What happened was that let us skip resolving the PTR first, since in a unicast environment we knew the IP address of the target already, and just wanted to quickly enumerate the services. If unicast resolution failed, then I fell back to more proper zeroconf resolution via PTR records and multicast. I'm upgrading the code in my tools to handle the necessary changes with the spaces in URL, and that should fix my self inflicted issue on this end, but the collision possibility still remains.

I was able to get two different devices to publish the same hostname and PID. Multiple identical DNS records in one domain can't be a good thing. I'll put together a PR with the UUID instead of hostname -plus-PID for service names and submit it for code review to feel out opinions. It's a simple enough change that I can't see breaking anything else.

Thanks for your input!

cerna commented 5 years ago

Well, I don't think that the current implementation of Machinetalk was made for multi-Machinekit semi-private type of network environment. More likely the intended use was for home network with one 3D printer and user interface on MDA or basic machine-specific network without upstream connection to home/shop network. This is seen on the lack of any security or authorization layer in Machinetalk. (It's tons of hard labor so I am not surprised no one wants to do it.)

Frankly, if you want to do something in this, it would be nice for machinekit/lib/python/machinekit/service.py file to use on background the C function call to publish and unpublish Machinetalk services so the whole process would be streamlined and not defined in two different places (that I know about). If @machinekoder didn't have some specific reason for doing it this way.

When the lathiat/avahi#125 will be merged upstream and in Debian repository, I would probably try to use it and change service publishing in a way, that they will not be published to local network when REMOTE in machinekit.ini is set to 0.

dkhughes commented 5 years ago

Multi-machinekit instances in a physically isolated private network has been my use case since 2016. I'm not sure if @machinekoder and @mhaberler had that case in mind, but we do it regularly. Plus, it's useful for passive monitoring cases.

I've always been under the impression we are handling security before the network that the machinekit instances will live on, rather than exposing it to the wild web in itself.

"I would probably try to use it and change service publishing in a way, that they will not be published to local network when REMOTE in machinekit.ini is set to 0."

I imagine this is so that if the instance of machinekit (say a local pc) is running the machine and the primary GUI locally, you can also have a remote monitor that can watch the machine (via web interface, mobile device, etc.). Maybe there should just be a new flag that offers disabling the local publications.

cerna commented 5 years ago

Yeah, I have been meaning to prepare set-up for Virtual LAN - i.e. the Machinekit Machinetalk communication would be separated to encrypted virtual layer. But so far I didn't find enought time/need to do it (and actually test it). That could be one way how to solve "security". (Given that the interface selection in MACHINEKIT.ini works which I haven't tested yet.) But there is still the glaring problem of authorization, i.e. two clients (UIs for example) with different access levels (roles basically).

I have been looking into Domain Names of services in Machinekit Machinetalk Services and there is no consistency. So if some change should be implemented in naming, I think that it should be templated. I would propose something like:

Machinekit MTS {name/type of service} on GUID {actual GUID of Machinekit Instance}._machinekit._tcp.local

or

Machinetalk Service {name/type of service} on GUID {actual GUID of Machinekit Instance}._machinekit._tcp.local

given that the word "machinekit" is already part of the domain. (MTS stands for MachiTalk Service.)

Of course there is still problem with the same hostname. When you tried it, what happened on A/AAAA question (request for A/AAAA resource record)?

There could be raised some kind of error on service announce if the hostname was not unique, but I am not sure if that is good behaviour.

I imagine this is so that if the instance of machinekit (say a local pc) is running the machine and the primary GUI locally, you can also have a remote monitor that can watch the machine (via web interface, mobile device, etc.). Maybe there should just be a new flag that offers disabling the local publications.

Maybe, but I don't think so, there is

 if remote == 0:
        logger.info(
            "Remote communication is deactivated, configserver will use the loopback interfaces"
        )
logger.info(("set REMOTE in " + mkini + " to 1 to enable remote communication"))

in the src/machinetalk/mklauncher/mklauncher.py source file from which I deduce that 0 means no remote/no activity on network interface.

dkhughes commented 5 years ago

I totally agree with making the naming more consistent. It took me a little while to figure out that the service names were being published by the python service file instead of the mk_service.cxx and friends, and there is still the special case for the launcher subscription service naming.

"When you tried it, what happened on A/AAAA question (request for A/AAAA resource record)?"

Well, it depends. If I accepted both instances as DNS servers to resolve, then the order of resolution would determine the IP address returned. The second one would never get queried since I already had a valid entry with a valid TTL in cache. What I actually did is ask the user for the proper instance by IP, and I could unicast query to each specific instance for the service in question to be resolved, in which case the duplicate entry wasn't that big of a deal. I just get grumbly when I see the same DNS records being returned on a service lookup by multiple devices.

I queried some zeroconf hardware (printers), and the most common responses I see are the device name or part number with an abbreviated or full UUID. Here is a sample with HP printers responding:

> dns-sd -B _printers._tcp local
...
IPREM  Add     3 23 local.                    _printer._tcp.            HP LaserJet MFP M426fdw (074190)
IPREM  Add     3 23 local.                    _printer._tcp.            HP Color LaserJet CP4520 Series [61A565]
IPREM  Add     3 23 local.                    _printer._tcp.            HP LaserJet M604 [AD61D7]
IPREM  Add     3 23 local.                    _printer._tcp.            HP LaserJet MFP M426fdw (C03166)
...

Notice the duplicate M426fdw printers that are differentiated by that UUID snippet. No device in my network is responding with a hostname, but that doesn't mean that it's not an option, just none of my hardware implements it.

I like your template idea without the machinekit redundancy, but maybe slightly less verbose:

Machinetalk Service {name/type of service} [{actual GUID of Machinekit Instance}]._machinekit._tcp.local
cerna commented 5 years ago

I like your template idea without the machinekit redundancy, but maybe slightly less verbose:

The idea is to have solid static separators in the string for regular expression search and group capturing. So the Domain name (I even added the Hostname/FQDM to the string as I think that it is useful):

Machinetalk Service Hal rcomp on GUID a4a73df3-af5e-4a57-8e21-2d08b2b49e0a DM 3dprinter.local._machinekit._tcp.local.

is searchable with regex (this is low quality top-of-my-hat, I am sure that in production it would be lot more refined):

^Machinetalk\sService\s([\w\s]+)\son\s((GUID\s([\d\w\W]+))\s(DM\s([\w\.]+)))\.\_machinekit\._tcp.local\.?$

and then the engine of your favorite language has the GUID, the hostname/FQDM so you can do some hackery and avoid querying for SRV and so on.

Hovewer was I ommiting these hard separators and creating something like:

Machinetalk Service Hal rcomp a4a73df3-af5e-4a57-8e21-2d08b2b49e0a 3dprinter.local._machinekit._tcp.local.

and

^Machinetalk\sService\s([\w\s]+)\s(([\d\w\W]+)\s([\w\.]+))\.\_machinekit\._tcp.local\.?$

and then somebody in the future added the on to the Domain Name string, for example:

Machinetalk Service Hal rcomp on a4a73df3-af5e-4a57-8e21-2d08b2b49e0a 3dprinter.local._machinekit._tcp.local.

then this regex would take the on to be part of the Service Name, because the \s cannot be used as a separator as there are multi-word and uni-word service names. And I want the regex on change/error to fail completely and not produce hidden problem.

(Of course you could create specific regex pattern which enumerate all known services and then go on to catch specific format of GUID, hostname etc.)

The main reason why this won't work and what I have just realized is the fact that DNS limit each label of domain name (between .) to 63 octets.

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators).

from RFC2181

Hyphenated GUID takes 36 characters, 10-15 would take Service name, deal breaker is the hostname/FQDM which can have the lenght of whole domain.

Did I miss something?

I will have to think about how to tackle this.

dkhughes commented 5 years ago

I think it should be either the hostname or the GUID. Plus, verbosely including

Machinetalk Service

in light of the character limit sounds like a misstep. Looking back to the examples I provided before, none of the IDs include "printer". We know it's a machinetalk service since that's the type we queried for:

_machinetalk._tcp.local.

Your regex issue with the space including the separator as service name is probably why the majority of devices I've been querying over the last few days include some form of known separator, '[' or '(', to delimit the unique identifying property (abbreviated serial # in most cases).

How about the following? Will require two options being added to the config file:

{Optional Prefix abbreviation} {Service Name} [{GUID *or* Hostname}]._machinekit._tcp.local.

The optional prefix could be used to identify a custom image build, like "MK" for a stock machinekit image/package or "BMK" for Bill's custom machinekit. The unique identifier option would allow the end user to pick between hostname or GUID, delimited by a strong separation characters. In small environments where each machine is customized manually, hostname would probably be fine (and the default). Then, in cases like mine, I can swap the default config to use GUID and avoid the collisions in the PTR/A records on the local DNS domain.

We can include documentation regarding the prefix length limit, and a service name limit to respect RFC2181. This method keeps our characters down, and allows for a user configurable prefix for image customization.

An example with the launcher service:

MK Launcher [machinekit.local]._machinekit._tcp.local.

Swapping in GUID instead of hostname yields 50chars, still below the limit. But, a subset of the guid could be used and still maintain a smaller chance of collision while allowing a longer service name.

cerna commented 5 years ago

I like this idea. I would implement it this way:

  1. Add new label into machinekit.ini: SERVICE_DOMAIN_NAME="MTS *SERVICEID on *GUID" which will be used as pattern for creating service domain name. The rules would be that the pattern must include *SERVICEID and the one of *GUID or *DM or both. (Does current Machinekit INI parser allow value included with quotes to allow whitespaces in strings? Will have to investigate.)

  2. Implement C function that will take in the service name (maybe create enum with all services) and will return wanted part of domain name which can be fed to avahi. In the body it will get the pattern (if no pattern is found in machinekit.ini, it will use default version), process the string and replace *SERVICEID, *GUID and *DM with actual values, verify that the string is under limit (and if not will fail) and return it.

  3. The current function which does the publishing and avahi handling will use aforementioned function instead of taking parameter, or better said will take parameter of service name (maybe again enum).

  4. Calling of this function from service's .c /.h files will need to be updated acordingly to use the service name only (or the enum).

  5. Equivalent for python machinekit component.

I think that this solution is open for extension/changes in the future (adding new *IDENTIFIERS) and solves current problem.

What do you think?

EDIT:

How about the following?

I see no problem with this approach.

cerna commented 5 years ago

I also discovered, that machinekit/machinekit#531 and machinekit/machinekit-hal#113 and machinekit/machinekit-cnc#39 issues were never properly resolved and if this solution should work as described, we should address these issues also.

cerna commented 5 years ago

Given the problem outlined in machinekit/machinekit-cnc#39 and #113 is so far solved, it's time to finish this one also.

If the pattern system should be implemented (I am currently playing with C side), there is (probably) need to make it as similar as it could get for a normal user to the current state. So I think adding section to machinekit.ini with these tokens should be enough:

# -------------- Service Domain Name Pattern -----------------
#
# Pattern which is used to create the first part of a domain name used when announcing services
# of Machinetalk, for example
# SERVICE_DOMAIN_PATTERN="MTS *SN* on *DM*" could create domain name
# MTS Log on machinekit.local._machinekit._tcp.local
#
# Allowed keywords for automatically replaced values are
# *SN* for unique name of Machinetalk Service
# *DN* for hostname or FQDM of the machine running Machinekit
# *MKUUID* for Machinekit UUID of current running Machinekit Instance
# *PID* for PID number or given a process of Machinetalk Service
SERVICE_DOMAIN_PATTERN="MTS *SN* on *DM*"

Or would anybody want something else? (It should be extensible.)

dkhughes commented 5 years ago

Yes, that's what I'm doing here. I extended machinekit.ini template with:

# DNS records announcing services are normalized for clarity, and to avoid
# accidental collisions in the local domain between multiple instances.
#
# ANNOUNCE_FORMAT is prepended to the standard machinekit service type
# designation. The full DNS record announced in the domain is:
#
# ANNOUNCE_FORMAT._machinekit._tcp.local.
#
# Valid key values for subsitution at publication are:
# $MKUUID - The MKUUID specified above
# $HOSTNAME - The instance hostname reported in /etc/hostname
# $IP - The IP address of the host
# $SRVNAME - The pretty name of the service (HAL RCOMP, Launcher, ...)
# $SRVTYPE - The type of service published (launcher, status, ...)
# 
# Default value is: 
# ANNOUNCE_FORMAT="MK $SRVNAME on $HOSTNAME"
#
# An example using MKUUID instead of hostname:
# ANNOUNCE_FORMAT="MK $SRVTYPE [$MKUUID]"

There are lots of places the announcement is hardcoded to pick through (both python and c for haltalk, webtalk, configserver, mklauncher, mkwrapper, etc., etc.).

cerna commented 5 years ago

OK, if you are doing this I will let you to it as I consider competing as counterproductive and never really enjoyed programming contests.

There are lots of places the announcement is hardcoded to pick through (both python and c for haltalk, webtalk, configserver, mklauncher, mkwrapper, etc., etc.).

I was thinking about changing the signature of int mk_announce(mk_netopts_t *n, mk_socket_t *s, const char *headline, const char *path) as this is the point where the string is passed to either some kind of struct or enum. That would need changing code everywhere, so I took it as given.

I usually work with OOP empowered languages, so this is not exactly logical at first glance to me and many things I consider as "not so great" or "unclean" in this code probably has it's origin in this clutch of mine.

BTW, what is the difference between $SRVNAME and $SRVTYPE? It isn't obvious at first glance. Or you mean Components and Services as per this article?

dkhughes commented 5 years ago

I consider competing as counterproductive

Me too. I'll try to get a link to something that works tonight for review by others. I had already started the python side yesterday.

$SRVNAME is a pretty formatted name, where as type is the abbreviated type name returned in the TXT record. Since UUIDs can be rather long, I wanted the type option just for brevity if required. For example, with the hal remote component haltalk:

$SRVNAME = "HAL Rcomp Service"
$SRVTYPE = "halrcomp"
dkhughes commented 5 years ago

I have this working now. Here is a dump of the dns records when haltalk and mkwrapper are running with the default format:

 IPv6 "MK HAL Rcommand service on machinekit.local"       _machinekit._tcp     local
 IPv6 "MK HAL Rcomp service on machinekit.local"          _machinekit._tcp     local
 IPv6 "MK HAL Group service on machinekit.local"          _machinekit._tcp     local
 IPv6 "MK Log service on machinekit.local"                _machinekit._tcp     local
 IPv4 "MK HAL Rcommand service on machinekit.local"       _machinekit._tcp     local
 IPv4 "MK HAL Rcomp service on machinekit.local"          _machinekit._tcp     local
 IPv4 "MK HAL Group service on machinekit.local"          _machinekit._tcp     local
 IPv4 "MK Log service on machinekit.local"                _machinekit._tcp     local
 IPv4 "MK Preview Status on machinekit.local"       _machinekit._tcp     local
 IPv4 "MK Preview on machinekit.local"              _machinekit._tcp     local
 IPv4 "MK Command on machinekit.local"              _machinekit._tcp     local
 IPv4 "MK Error on machinekit.local"                _machinekit._tcp     local
 IPv4 "MK Status on machinekit.local"               _machinekit._tcp     local
 IPv4 "MK File on machinekit.local"                 _machinekit._tcp     local

and an ini file adjustment to change to uuid and service type instead:

$ avahi-browse -d local _machinekit._tcp
 IPv6 "MK halrcmd [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv6 "MK halrcomp [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv6 "MK halgroup [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv6 "MK log [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK previewstatus [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK preview [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK command [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK error [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK status [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK file [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK halrcmd [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK halrcomp [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK halgroup [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local
 IPv4 "MK log [ce39bf74-2f35-49e6-a997-43dacf157372]" _machinekit._tcp     local

Updates have also been applied to all of the other components as well. I have to run for a bit, but I'll post a link to the branch for other testers who would like to help out once I rebase the git tree.

The machinekit.ini file has been updated but the code changes are backwards compatible in case an end user decides to keep their existing ini file in a dpkg update scenario.

machinekoder commented 5 years ago

I can't read through everything, but I assume the problem is identifying which services belong to which Machinekit instance. The publish service name is the wrong tool for doing that, as it can be anything, that's on purpose.

For example, let's take a look at the launcher service published by mklauncher. On can define a custom instance name such as for example My Cool Machine which would then be published as "My Cool Machine"._machinekit._tcp.local

To identify a specific Machinekit instance use uuid TXT record published with every service announcement. For reference take a look at https://github.com/machinekit/pymachinetalk/blob/master/pymachinetalk/dns_sd.py#L129

For reference how the service discovery works please take a look at section "Service Discovery" in https://machinekoder.com/machinetalk-explained-part-3-technologies/

And yes, Machinetalk is designed with multi-instance environments in mind. It is not designed for security at the moment, so I wouldn't publish the services on a public network. It can be extended for security using ZMQs elliptic curves (google for iron house pattern), but so far that was not necessary.

Regarding VPN: That's possible, but not straightforward to do. It's definitely not working with the Avahi version shipping with Debian Wheezy due to a bug in avahi. If someone wants to know more, I could explain the challenges.

cerna commented 5 years ago

@machinekoder The problem is a combination of smaller problems: