Open kongweiguo opened 1 year ago
Hi @kongweiguo, I think this would be a great capability to have. We have discussed doing something similar for workload attestation in the past as well (#2666).
In order for this to work effectively, the server would need some way to signal to agents that they need to re-attest, as well as a way for the server to enforce that agents must re-attest at a certain point.
An important point to consider is that node attestation is also not always safe to repeat automatically depending on the node attestor plugin because some node attestors have a "trust on first use" (TOFU) design (e.g. aws_iid
, gcp_iit
). We wouldn't want to have this automatic re-attestation performed for the TOFU node attestors, since it wouldn't be a safe operation, since for example any process on an AWS VM could fetch the local IID from the IMDS and impersonate SPIRE Agent. All of the builtin node attestors that have this TOFU principle have the CanReattest
field set to false
in the NodeAttestor interface: https://github.com/search?q=repo%3Aspiffe%2Fspire%20CanReattest&type=code
This gets a little more complicated because of the fact that users can provide SPIFFE ID path templates in the NodeAttestor plugins, so it's possible a re-attestation could cause the agent to receive a new SPIFFE ID based on the newly discovered node selectors. We would also want to make sure that old new node selectors get cleaned up in the datastore.
All that being said, I think we would want to do some more detailed design first, considering some of the points I mentioned. Is this something you're interested in working on, @kongweiguo?
@rturner3 Sure, I am interested in working on this line. I am glad that we both agree with this is a good feature we should have.
Infact, I've been trying to build some outside systems to solve this problems. Becaouse of no mechanism we could use, basically, it's a hard/hack/trick work and have too many dependencies on our other internnal systems. That would be a great help for applying the SPIRE in production environment if its self have the mechanism.
Also, I really agree with you, we should start with some basic designs.
I think, maybe, I could do some initial design and post here. Or, how do you think where should we start with?
Perhaps the continuous attestation mentioned above is not the whole picture of the requirment, I think there should be two aspects:
From an operation and management perspective, we also need a feature/mechanism that is convenient for an outside system to adjust the NODE SCOPE of workload entry dynamically. So that the workloads SVID could be distributed/fetched to the right NODE agent in time.
I think the way this could be done today would be to have a registrar service that monitors node inventory and updates registrations as needed. Just to clarify, do you see any gaps with that approach that would lead to new requirements related to this proposal?
Would you mind offering a brief description of the processes illustrated in your diagram? I also wasn't sure what "ROT" and "OPS System" represented.
Hi @kongweiguo, just checking back in to see if you were planning to revisit this issue again regarding some of the open questions in previous comments?
Hey @kongweiguo - this seems like a great improvement, but feels like there's quite a few things we need to think through in order to implement it.
We're going to move it to the backlog as unscoped ... please let us know if you're able to help push this work forward in the near term
This issue is stale because it has been open for 365 days with no activity.
Now, the node attestation flow seems to be a one-shot action. After the node attestation procedures, spire server side node attestation plugin will emit some selectors back to spire server. It seems, those selectors will never be changed/updated until the next node attestation which is only trigged at the begining of the spire agent.
Firstly, from a security architecture perspective, Security Attestation should be continous.
Secondly, spire server's workload entries scope should be accurate. Especially in large-scale production scenarios, in order to meet the scheduling needs of the business App, the labels and taints of the node/agent may be changed frequently.
So I want to request a mechanism to make the node selectors in the spire server to keep up with real world's changes in time. There's some scenarios:
Proposal: The spire agent provides a new interface to the plugin so that the plugin can actively trigger Attestation. The plugin is responsible for sensing environmental changes.