spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.82k stars 478 forks source link

SPIRE Server should enforce agent IDs reside in the agent namespace #2712

Open azdagron opened 2 years ago

azdagron commented 2 years ago

SPIRE has assumed that node attestors would produce agent IDs that conform to the following convention:

spiffe://<trustdomain>/spire/agent/<nodeattestor>/<unique per-agent suffix>

(e.g. spiffe://example.org/spire/agent/join_token/21B6D625-CCF3-49E1-8E7C-812B3F55B3CB)

Although this convention is not required for agent authorization to take place safely, it does aid in debuggability, auditability, and the ability to prevent misconfiguration (e.g. cannot accidentally assign a workload an agent ID since we can validate that the requested ID is not reserved). Being able to answer the question "was this ID produced by a node attestor" is also important for at-a-glance understanding of logs.

Unfortunately, SPIRE does not enforce this convention currently, and cannot due to backcompat concerns. However, a deprecation warning is introduced with #2694. This should give us the ability to enforce this beginning with SPIRE 1.4 at the earliest (since 1.3 will be the first minor to include the deprecation warning).

This issue tracks the enforcement of the agent ID convention.

boyanguber commented 2 years ago

Thank you for making this Andrew!

Our team is currently planning out the process of migrating our customers to use the new Agent ID path. Just wanted some clarifications on a few things if possible:

Thanks again!

azdagron commented 2 years ago

Great questions @boyanguber!

When this becomes enforced in 1.4, does that mean that old agent ID's that don't conform to the new convention will not be able to attest/sync with spire server anymore? Or is it only applied to newly created agent ID paths?

I think all we plan to do as part of that release is ensure that newly attested nodes conform. We haven't discussed a timeline to enforce this for existing agents.

Will there be any auto-magic for nodes re-attesting under old formats to be auto-migrated to their new format?

This gets tricky. If we do decide to enforce this for existing agents, I can think of two mechanisms:

  1. Force operators to evicts and re-attest all agents. This would be operationally cumbersome, for sure.
  2. Allow agents to exchange an SVID with an old SPIFFE ID for a new one (at renewal time). This would require some sort of plugin integration that we currently don't have to allow the plugins to hook the renewal process.

This is further complicated by plugins needing to be careful that they do TOFU checks on the old ID shape until agents have migrated. This complication exists independent of the choice to enforce the ID shape on existing agents.

I need to think about this a bit more. I think it's probably safe to say that unless we come up with a really good migration story that we won't be enforcing this for existing agents any time soon, but I'll circle back with the maintainers and see what comes up.

In the meantime if you have any suggestions, we're all ears :)

azdagron commented 2 years ago

So the maintainers huddled on this. I think this is the plan of action we'd like to follow:

  1. Warn on the undesired ID usage (this is already merged and will ship in 1.2.1. Since this was not in place for 1.2.0, we cannot change it through 1.3.x)
  2. Beginning with 1.4.0, disallow newly attested nodes which do not conform to the expected ID shape. Existing agents will still continue to operate successfully. As part of this change, we will also introduce a warning for existing IDs that are non-conformant.
  3. In 1.5.0 (or a later minor version), we will start denying agent authorization for agents with non-conformant IDs.

We will also provide a migration guide for operators who have found themselves impacted by this enforcement. This will cover topics like TOFU, which is the most security sensitive outcome of this enforcement (i.e. having to check both the old and new ID shape for TOFU during node attestation until all agents are migrated).

As far as timeline, we're looking at roughly 6 months until 1.4.0 and another 3 months after that for 1.5.0.

zmt commented 2 years ago

Notes from discussion today:

zmt commented 1 year ago

I need to work on this in conjunction with #3527. Even if I complete the PR in the 1.8.1 timeframe, we should not land it until 1.9.0.

zmt commented 1 year ago

It is clear to me now that these will never be high enough priority among my competing work priorities that I will ever accomplish them. Please re-label help wanted. I apologize to the community for holding these so long and not completing them.

SilvaMatteus commented 10 months ago

Hello @azdagron,

I would be happy to work on this feature after I finish the retry bootstrap one.