peer auto-configuration proposals

bganne commented 3 years ago

Hi Jordan, wgsd is a great project!

Here are some proposals, I'd be curious to get your feedback:

consistently use Service Instance Name as <Instance> . <Service> . <Domain> everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as <Instance> . <Domain>
update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand
update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer
add a vagrant environment to easily play around

Best, ben

jwhited commented 3 years ago

Hi Ben,

Thanks for the feedback and proposals, some great ideas. Sorry for the delay...

consistently use Service Instance Name as . . everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as .

Adhering to https://tools.ietf.org/html/rfc6763#section-4.1 makes sense to me. If this is broken out into its own PR we can merge it. Will probably tag this with a new major version as it's a backwards-incompatible change.

update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand

Adding AllowedIPs config into the DNS changes the scope of wgsd as it's currently for endpoint discovery, but I can see how this would be useful for bootstrapping from scratch. Would love to hear more about how you are using this.

Is the duplication of the public key into the TXT record to make it easier to map with the AllowedIPs? This should also be resolvable via the instance name.

update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer

Similar to above, would love to hear more about how you're using this.

bganne commented 3 years ago

consistently use Service Instance Name as . . everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as .

Adhering to https://tools.ietf.org/html/rfc6763#section-4.1 makes sense to me. If this is broken out into its own PR we can merge it. Will probably tag this with a new major version as it's a backwards-incompatible change.

Will do. Note that I am no expert, it is my own understanding of the RFC, but this looks consistent to me.

update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand

Adding AllowedIPs config into the DNS changes the scope of wgsd as it's currently for endpoint discovery, but I can see how this would be useful for bootstrapping from scratch. Would love to hear more about how you are using this.

I'd argue it is still about endpoint discovery but in a more dynamic environment. The usecase is this:

you want to interconnect services nodes through wireguard
these nodes can come and go dynamically (auto scaling)
each node will probably connect to only a subset of the other nodes (eg. you probably have a lot of identical nodes for each service and you load-balance)

In the current implementation, when on-boarding a new node, you must pre-configured all of its peers in the wireguard configuration, and each time a node come or go you must also update all configurations.

With these changes, each node only need to be configured with the registry address, and then the configuration does not need to be touched anymore for the lifetime of the node. When a new node comes in, all other nodes can update their configuration to connect to it.

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

when a node comes in, it connects to the registry and that's it. There is no other wireguard connection
when the node starts to send packet to another node, there will not be any active tunnel for it and packets will be dropped. But those packets can be "punted" to wgsd-client which can then inspect the destination IP, ask the registry for the relevant configuration and setup the wireguard connection. Note we probably need a similar "punt" path on the receiving side, as both endpoints must be configured accordingly
subsequent packets flow through the tunnel
wgsd-client track active conversations. If a tunnel sees no packets for some time, it can be tear-downed

I say 'wgsd-client' here but it could be another client so we can keep the 2 usecases "simple static full-mesh" and "dynamic partial mesh" separate.

Is the duplication of the public key into the TXT record to make it easier to map with the AllowedIPs? This should also be resolvable via the instance name.

When requesting configuration per AllowedIP (and not per instance name), the public key can be scrapped from the instance name but it seemed cleaner to me to just add an additional record. I do not have a strong opinion here, in fact I started by scraping the instance name :) and then changed it to add the record. Let me know what you prefer.

update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer

Similar to above, would love to hear more about how you're using this.

Yes it is just extending the client to support the usecase described above.

jwhited commented 3 years ago

I'd argue it is still about endpoint discovery but in a more dynamic environment. The usecase is this:

you want to interconnect services nodes through wireguard these nodes can come and go dynamically (auto scaling) each node will probably connect to only a subset of the other nodes (eg. you probably have a lot of identical nodes for each service and you load-balance)

Thanks for elaborating, makes sense to me. I'm on board w/including AllowedIPs in the DNS as TXT records. RFC6763 section 6 has quite a bit to say about formatting of additional configuration in TXT records. Would be good to combine that guidance with whatever learnings can be found from other RFCs/patterns where IP address prefixes are included in the DNS (SPF, APL, ...?)

When requesting configuration per AllowedIP (and not per instance name), the public key can be scrapped from the instance name but it seemed cleaner to me to just add an additional record.

Since the public key is configuration, and configuration should exist as TXT record data I think this makes sense. It's also just convenient when eyeballing base64 keys.

Happy to work on this or review a PR specifically for AllowedIPs/Pubkeys in TXT records. Client changes are obviously welcome to make use of new DNS config data, but may be easier to nail down the DNS contract first.

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

This sounds really interesting. Is there a use case where you would have lots of tunnels and the resource cost of maintaining the tunnels is too high? Or is this for security reasons?

jwhited commented 3 years ago

added https://github.com/jwhited/wgsd/issues/17 with an initial idea for pub key and allowed ips in TXT

jwhited commented 3 years ago

added https://github.com/jwhited/wgsd/issues/20 for tracking a Vagrant environment. If you want to break that work out into a new PR happy to review

bganne commented 3 years ago

added #17 with an initial idea for pub key and allowed ips in TXT

Should I update the client to take advantage of it?

added #20 for tracking a Vagrant environment. If you want to break that work out into a new PR happy to review

Done in #28

jwhited commented 3 years ago

Should I update the client to take advantage of it?

The original client was built with the intent to update endpoint values for peers that were already configured. Now with config data (allowed IPs) being served via TXT records we can support a full bootstrap/mesh from scratch.

With that being said, I'm not sure if wgsd-client should be extended with flags, or that should be added as its own client. Thoughts?

m00nwtchr commented 3 months ago

@bganne

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

* when a node comes in, it connects to the registry and that's it. There is no other wireguard connection

* when the node starts to send packet to another node, there will not be any active tunnel for it and packets will be dropped. But those packets can be "punted" to wgsd-client which can then inspect the destination IP, ask the registry for the relevant configuration and setup the wireguard connection. Note we probably need a similar "punt" path on the receiving side, as both endpoints must be configured accordingly

* subsequent packets flow through the tunnel

* wgsd-client track active conversations. If a tunnel sees no packets for some time, it can be tear-downed

All of that is completely unnecessary for WireGuard. There's no 'active tunnels'/connections and no cost associated with having many configured WireGuard peers, assuming equal amount of traffic in any given scenario. WireGuard only sends packets when packets are being sent into the tunnel, unless Persistent Keepalive is on, and you should only need that for the Registry tunnel.

jwhited / wgsd

peer auto-configuration proposals #12