JonTheNiceGuy commented 4 years ago

I notice in the config file there is a comment about multiple CAs. Does this mean that you could, in theory, have multiple CAs specified here? If so, would it look like this:

pki:
  ca: /etc/nebula/ca1.crt /etc/nebula/ca2.crt /etc/nebula/ca3.crt

Or (given the comment above that about inline : |

pki:
  ca: |
    /etc/nebula/ca1.crt
    /etc/nebula/ca2.crt
    /etc/nebula/ca3.crt

In what sort of context would you imagine have multiple CA's? e.g. CA per tier (e.g. management, prod, non-prod, T&V)?

As there are also comments about being able to use ca_name and ca_sha in the config file too, does this mean you might want to use a CA per "zone" (management, backup servers, etc.) and use that CA as part of your firewalling?

nbrownus commented 4 years ago

You can have multiple CAs but they'd all need to be in the same file, or if you inline them, the actual contents of the CA in config.

pki
  ca: /etc/nebula/ca.crt

/etc/nebula/ca.crt would contain

# comments and newlines are fine
# main root
-----BEGIN NEBULA CERTIFICATE-----
...
-----END NEBULA CERTIFICATE-----

# some other root
-----BEGIN NEBULA CERTIFICATE-----
...
-----END NEBULA CERTIFICATE-----

Same story with inlining

pki
  ca: |
    -----BEGIN NEBULA CERTIFICATE-----
    ...
    -----END NEBULA CERTIFICATE-----
    -----BEGIN NEBULA CERTIFICATE-----
    ...
    -----END NEBULA CERTIFICATE-----

A CA is like a really large hammer to segment multiple nebula networks. You can use them to segment however you see fit, but there may be trade offs depending on your goals and how you've set them up. The finer details are a larger topic. We have used them in the past to segment different environments based on risk. Only trusting the roots on certain machines where the two environments need to communicate.

ca_name and ca_sha firewall rules are there to limit access based on the signing root. This matters if you trust multiple roots and can't guarantee that your root is the only one that will ever issue a cert for a given name, group(s), or ip range.

nbrownus commented 4 years ago

One maybe helpful concrete example would be:

Employees use SSH to access production systems via a bastion. You could run nebula on all the laptops/desktops the employees use. Those certs would be signed by a "employee device" root. The ssh bastion would trust that root, as well as the "production server" root. The bastion firewall would only allow ssh from the "employee laptop" root while the trust on the "production server" root would allow the bastion to talk to other production servers. Those servers would only trust the "production server" root and only allow ssh from the bastion.

Laptops trust the "employee device" and "production server" root, are signed by the "employee device" root
Bastions trust the "employee device" and "production server" root, are signed by the "production server" root
Servers trust the "production server" root, are signed by the "production server" root

zeisss commented 4 years ago

Note that CAs also expire after 1 year by default. Adding a second (or more) CAs to a node allows for a gradual rotation of the CA.

rawdigits commented 4 years ago

@zeisss exactly! We also have multiple roots in our production deployment, some of which are offline and have never been used, so that we can move everything to a new root quickly if it were to become necessary.

stangri commented 4 years ago

@zeisss exactly! We also have multiple roots in our production deployment, some of which are offline and have never been used, so that we can move everything to a new root quickly if it were to become necessary.

@rawdigits could you please provide a walk-thru or an example of config.yml connecting to multiple lighthouses each using their own ca certificates? I've combined two ca.crt files into one file referred by config.yml, but I don't know what to do about client key/certificates. Thanks!

JonTheNiceGuy commented 4 years ago

I think you've got the role of a lighthouse confused :)

If you have an organisation with staff in three regions (let's call them London, Paris and Berlin), you might have a separate CA for the provisioning activities in those three regions (perhaps close to where your provisioning teams are based) and each region issues certificates for their local staff and servers.

So, you might end up with L-CA, P-CA, B-CA, plus L-001 signed by L-CA, P-001 by P-CA and B-001 by B-CA (as your first managed machines in each region). Your CA should be offline (or at least difficult to get to), so at the moment, your nodes have no way to reach each other, and this is where the lighthouses come in. You should have at least 2, so London and Berlin each stand up two lighthouses. L-LH01, L-LH02, B-LH01 and B-LH02. Each node from each region {L,P,B}-001 define 4 Lighthouses, and each include the CA public keys from {L,P,B}-CA.

This means that any node signed by the three CAs will be allowed to any of the nodes or lighthouses, and if any of the lighthouses go down, the nodes can still find each other.

Effectively, a Lighthouse is a peer-addressing service (like DNS, but only for nodes that talk to it), while the CA enforces an offline access control mechanism.

Does this help?

stangri commented 4 years ago

Does this help?

Tremendously. So for a single client with one lighthouse connection already set up to be able to connect to a different lighthouse, the new lighthouse config should include the old lighthouse ca.crt and the client config should include extra lighthouse as host? No other changes to the client configs?

Also, is it possible to generate the ca and client key pairs with 10 (and not 1) year expiration?

JonTheNiceGuy commented 4 years ago

I'm actually away from a computer all day today and won't really have scope to take a look until Monday. I'll take a look at this then, if that's OK?

JonTheNiceGuy commented 4 years ago

Second question first (as it's the much easier question to answer):

Also, is it possible to generate the ca and client key pairs with 10 (and not 1) year expiration?

Yes, see the duration flag in nebula-cert ca --help and nebula-cert sign --help

In answer to your first question:

So for a single client with one lighthouse connection already set up to be able to connect to a different lighthouse, the new lighthouse config should include the old lighthouse ca.crt and the client config should include extra lighthouse as host? No other changes to the client configs?

You should only need a single CA if you're not worrying about resilency of the CA, or if you only have a small deployment with few changes. You only need to extend the number of CAs you have if you want to ensure that if your CA is compromised or expires, you can start minting new certificates from the new CA without experiencing downtime.

I would fully expect your new lighthouse to use the same CA as the "old" lighthouse. In that case, your clients would only require the new lighthouse to be added to the config file.

If, as you're changing the config file anyway, you want to add a resilency option with the CA certificates, then yes, add the new "next" CA at the same time you add the new lighthouse, but with reasonable change mangement systems, that shouldn't need to be an atomic option (as in, it all happens at once).

More context below!

Certificate authorities (CA)

You only need a single CA, however, there are use cases where it might be desirable to have multple CAs, for example:

Resilience. Your primary CA is compromised or becomes unavailable. Having a secondary (or even tertiary) CA stored offline, elsewhere or on a fresh VM means you can start issuing certificates without interruption.
Localisation. You have a CA proximate to where you build and ship machines, for example, if you have multiple "provisioning" locations, you might have an offline CA stored at each of those locations, so that only the staff in that location can provision a certificate for installing on your machines.
Security Simplification. If your Admin Team always needs access to all the servers, have two CAs - one for the Admin Team, one for the servers. The admin team can only get certificates provisioned by the Admin Team CA, so the security ACL can state "only when signed by the Admin Team CA". This could equally be provided by having a single tag for Admin, but it's down to your threat model and onboarding processes.

Lighthouses

It is suggested you have at least two lighthouses in separate locations. These are effectively directory services for all your connecting machines, and act like a DNS service that only knows about the name-to-ip mapping for hosts which have connected to it. Having lots of these will generate more traffic. I suspect the "sweet spot" will be somewhere between 2 and 5 lighthouses, but each client must talk to each lighthouse, or all the machines won't be able to access all the other machines.

A lighthouse does not require a dedicated CA.

Hopefully this gives you much more detail! :)

stangri commented 4 years ago

@JonTheNiceGuy thanks for your elaborate reply. I actually do need two separate CAs. Can you please elaborate on how can a single client connect to two different lighthouses with different CAs?

JonTheNiceGuy commented 4 years ago

Ah, then, in that case, yep, your config file on all the lighthouses and nodes need to have both CAs specified (as per the comment above), like this:

pki:
  ca: |
    -----BEGIN NEBULA CERTIFICATE-----
    ...
    -----END NEBULA CERTIFICATE-----
    -----BEGIN NEBULA CERTIFICATE-----
    ...
    -----END NEBULA CERTIFICATE-----

Then, on all the non-lighthouse nodes, add the lighthouses, like this (assuming your Nebula IP range is 192.0.2.0/24 and your public IPs are 198.51.100.1 and 203.0.113.30):

static_host_map:
  "192.0.2.1": ["198.51.100.1:4242"]
  "192.0.2.2": ["203.0.113.30:4242"]
lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "192.0.2.1"
    - "192.0.2.2"

I think this has tracked so far outside the initial question, I think it's probably worth raising anything further as a fresh ticket.

stangri commented 4 years ago

static_host_map: "192.0.2.1": ["198.51.100.1:4242"] "192.0.2.2": ["203.0.113.30:4242"]

Please correct me if I'm wrong, but this config implies the single organization with multiple lighthouses for redundancy, etc.

I'm in position where I have a nebula client which needs to be able to connect to lighthouses of different organizations, meaning the internal addresses, ca-files, etc. are all different.

Is it possible to configure the single client to be able to connect to Nebulas of different organizations at the same time or would I need to keep two separate yml-files and restart the local nebula client with a different yml file to connect to a different organization?

zeisss commented 4 years ago

@stangri

Please correct me if I'm wrong, but this config implies the single organization with multiple lighthouses for redundancy, etc.

The multiple lighthouses are not necessarily for redundancy. They could be located in difference datacenters with separate networks.

Is it possible to configure the single client to be able to connect to Nebulas of different organizations at the same time or would I need to keep two separate yml-files and restart the local nebula client with a different yml file to connect to a different organization?

That should be possible. Just configure the lighthouses like you did in your example. Make sure your CA file all necessary CAs to connect to those lighthouses. The only requisite afaict would be that the IPs MUST NOT conflict.

Generally this sounds like a separate issue - I would recommend making a separate issue or joining the Slack community.

nh2 commented 2 years ago

CAs also expire after 1 year by default

What happens after the CA expires, does the VPN stop functioning, or does it only prevent me to sign new certs?

If the former, this should really be warned about in the main getting started guide and README section.

Edit: I found:

$ nebula-cert sign -help
...
  -duration duration
        Optional: how long the cert should be valid for. The default is 1 second before the signing cert expires. Valid time units are seconds: "s", minutes: "m", hours: "h"

So that indeed sounds like all VPN networks created with the README and Quick Start Guide will suddenly stop working after 1 year, is that correct?

terrywang commented 2 years ago

@nh2 The overlay network breaks if the CA certificate(s) expires.

The handshake between non-lighthouse vs lighthouse nodes fails due to no valid CA.

Mechanisms can be put in place to monitor keywords in log to detect the issue in a timely manner (automation can be implemented to rotate CA certificate and sign new certificates for non-lighthouse nodes) - I am still exploring for a "cheap" solution. Ideally, I'd like to see nebula-cert allow user specified expiry.

NOTE: I am looking at Tailscale and Netmaker, both based on WireGuard. Tailscale is hosted service; Netmaker can be self-hosted, comes with management UI.

nh2 commented 2 years ago

@terrywang Thanks for the clarification!

Think the Quick Start Guide and the REAME should absolutely mention this, because neither they nor any other intros to Nebula I read so far mention this very important topic of certificate expiry, and I expect a lot of users to be disappointed when their networks break after a year.

nh2 commented 2 years ago

Another question that I think should be explicitly explained in the CA documentation:

Does Nebula currently support any form of a CA delegation tree?

The fact that earlier posts in here used the terminology root seems to imply that like with TLS, one could make a root CA (e.g. company-wide) that signs a subordinate CA (e.g. company office in a specific country).

Perhaps the higher-authority CA could then allow the lower-authority CA to sign for some specific groups.

Is there such a thing currently, or is it planned, or is it a bad idea because anything that could be achieved by this can already be achieved by just making completely independent CAs that don't sign each other?

I got this question when reading the docs of nebula-cert ca, namely:

  -groups string
        Optional: comma separated list of groups. This will limit which groups subordinate certs can use

so I was wondering if it would make sense to make the handing out of groups delegated / hierarchical.

rodja commented 2 years ago

I just run into the issue of having an expired certificate authority:

So that indeed sounds like all VPN networks created with the README and Quick Start Guide will suddenly stop working after 1 year, is that correct?

What a pain. The documentation should mention this. And also give some hints about how to renew the certificate authority -- I'm quite lost at the moment.

JonTheNiceGuy commented 2 years ago

Hi, I'm not involved in the project, but I think I can reply on this one @rodja.

You don't renew the CA, you generate a new one, and then re-issue the certs using the new CA. The CA is the "Authority" that allows all the other certs to connect. When it expires, it means that all the other certs no longer have the authority to connect.

And so, you'll have to re-run the process for generating the CA using nebula-cert, and then re-issue each client certificate.

Depending on your threat model, you may wish to make your CA expire after 10 years, or even 25, particularly if this is just for your personal devices. If this is being used in a business setting, I'd consider having a rolling set of CAs, with a new one generated every 3 months, and some form of automation to add all the CAs together in your endpoints, but deprecating the private key for the CA each month, like this:

Day	CAs Available	CA Active	CAs with no private key
1	A, B, C	A	None
90	A, B, C, D	B	A
180	A, B, C, D, E	C	A, B
270	B, C, D, E, F	D	B, C
360	C, D, E, F, G	E	C, D

etc.

Another option you could review is using separate CAs for long-lived servers or devices, versus separate CAs for short-lived devices, using different address spaces.

For example, perhaps your lighthouses have a CA with a 10 year expiry on it, your management devices have a CA with a 5 year expire on it, and your auto-scaling, self-provisioned servers have a CA with a one-month expiry on it?

This means a lot more automation, but ultimately, loading a new config file is a HUP-call away (kill -HUP $pid) and shouldn't interrupt your traffic flows.

johnmaguire commented 1 year ago

I'm closing this issue out as it looks like the question has been answered. Please feel free to open a new issue if you've stumbled across this issue and still have questions!

slackhq / nebula

Documentation question: CAs #111

Certificate authorities (CA)

Lighthouses