spiffe / helm-charts-hardened

Apache License 2.0
12 stars 26 forks source link

SVID is not valid: public key "X5ZOAszrYj0LnaHdUqRWLZcMtzpgcY9L" not found in trust domain {federated cluster} #335

Closed drewwells closed 2 months ago

drewwells commented 2 months ago

When the CA in spire rotates, the federation connections are interrupted. I don't have mitigation steps to reconnect them. Deleting the federation CR and recreating them, does not re-establish trust.

Repo Steps:

  1. Set your CA TTL to a short period of time across two clusters ie. 10mins
  2. Wait until TTL expires
  3. Mint a token on clusterA, attempt to validate it on clusterB

Reported on the old repo here: https://github.com/spiffe/helm-charts/issues/364

Workaround: Use the spire-server cli to force a refresh

kubectl -n spire-server exec -q spire-server-0 -- \
  spire-server federation refresh -id {trust domain of federated server}
drewwells commented 2 months ago

Notice the bundle refresh is far too long into the future, it needs to be refreshing at a duration not to exceed the CA TTL.

time="2024-04-26T18:05:20Z" level=debug msg="Scheduling next bundle refresh" at="2024-04-30T12:05:20Z" subsystem_name=bundle_client trust_domain=stg-1.infoblox.com
drewwells commented 2 months ago

This issue is caused by default refresh_bundle being 1 year, regardless of what CA TTL is configured to. I have seen a scenario where spire will fatal in this configuration, but it's not consistently doing that. Perhaps a bug upstream in spiffe/spire

kfox1111 commented 2 months ago

Do you know where that is set?

drewwells commented 2 months ago

federation.bundle_refresh

kfox1111 commented 2 months ago

I can not find a setting of federation.bundle_refresh anywhere.

there is a federation.bundle_endpoint.refresh_hint in the spire server, but it defaults to 5m. Where are you seeing 1y?

drewwells commented 2 months ago

I can not find a setting of federation.bundle_refresh anywhere.

there is a federation.bundle_endpoint.refresh_hint in the spire server, but it defaults to 5m. Where are you seeing 1y?

Deploy federation and access the PKI web endpoint. spiffe_refresh_hint is set to 1 year. I can open a PR to at least add documentation to it. We're definitely setting it in the 0.14.0 chart and it's working

kfox1111 commented 2 months ago

I don't understand....

helm upgrade spire charts/spire --set spire-server.federation.enabled=true

$ curl https://10.107.72.89:8443 -k -s | grep hint "spiffe_refresh_hint": 300

5 minutes. the default in the spire docs.

drewwells commented 2 months ago

I'd guess this was fixed in the version of chart you installed. Without overrides on 0.14.0, I was still seeing 1year

kfox1111 commented 2 months ago

helm upgrade --install spire spire --set spire-server.federation.enabled=true --version 0.14.0 --repo https://spiffe.github.io/helm-charts-hardened/

$ curl https://10.106.252.22:8443 -k -s | grep hint "spiffe_refresh_hint": 8641

Looks like spire-server defaults to ~2.4 hours in that version of spire.

drewwells commented 2 months ago

Is that value persisted? We upgraded from a previous version to 0.14.0

On May 2, 2024, Anthony Sottile @.***> wrote:

helm upgrade --install spire spire --set spire-server.federation.enabled=true --version 0.14.0 --repo https://spiffe.github.io/helm-charts-hardened/

$ curl https://10.106.252.22:8443 -k -s | grep hint "spiffe_refresh_hint": 8641

Looks like spire-server defaults to ~2.4 hours in that version of spire.

— Reply to this email directly, view it on GitHub https://github.com/spiffe/helm-charts-hardened/issues/335#issuecomment-2091241794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB2MEZ7QHNLEK4BSE4CL2LZAKBNZAVCNFSM6AAAAABG3HDR52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRGI2DCNZZGQ . You are receiving this because you authored the thread.Message ID: @.***>