spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.8k stars 475 forks source link

latest release integration suites reporting unrecognized config apparently not running current code #3937

Closed zmt closed 1 year ago

zmt commented 1 year ago

Minimal repro:

2023-03-02T08:05:35+0000 ztrain spiffe/spire main v1.2.0-819-g2cc0af03 % git checkout v1.6.1
HEAD is now at 48e6212b Honor the `ca_ttl` server config (#3934)
2023-03-02T08:05:39+0000 ztrain spiffe/spire tags/v1.6.1^0 v1.6.1 % git stash pop
HEAD detached at v1.6.1
Changes not staged for commit:
        modified:   Makefile

no changes added to commit
Dropped refs/stash@{0} (0ca0bcff7e4f6d5c078dd816a117a190fb4bf2c1)
2023-03-02T08:05:46+0000 ztrain spiffe/spire tags/v1.6.1^0 v1.6.1 !% git diff
diff --git a/Makefile b/Makefile
index 3c376120..21f137d7 100644
--- a/Makefile
+++ b/Makefile
@@ -325,7 +325,13 @@ artifact: build

 .PHONY: container-builder
 container-builder:
-       $(E)docker buildx create --platform $(PLATFORMS) --name container-builder --node container-builder0 --use
+       # On a docker within docker where the daemon is exposed via TLS over TCP
+       # with DOCKER_HOST, DOCKER_CERT_PATH, and DOCKER_TLS_VERIFY set in env.
+       # This requires a context be created from the env config and used by the
+       # buildx create.
+       $(E)rm -rf $(DOCKER_CERT_PATH)/contexts
+       $(E)docker context create spire-images --description "devpod" --docker "host=$(DOCKER_HOST),ca=$(DOCKER_CERT_PATH)/ca.pem,cert=$(DOCKER_CERT_PATH)/cert.pem,key=$(DOCKER_CERT_PATH)/key.pem"
+       $(E)docker buildx create spire-images --platform $(PLATFORMS) --name container-builder --node container-builder0 --use

 define image_rule
 .PHONY: $1
2023-03-02T08:05:59+0000 ztrain spiffe/spire tags/v1.6.1^0 v1.6.1 !% SUITES="suites/admin-endpoints" make integration
Testing suites/admin-endpoints
[2023-03-02T08:06:03Z] running "admin-endpoints" test suite...
[2023-03-02T08:06:03Z] executing 00-setup...
[2023-03-02T08:06:05Z] executing 01-start-server...
[2023-03-02T08:06:05Z] bringing up spire-server-a spire-server-b...
Creating network "spire-integration-mxxzu1_default" with the default driver
Creating spire-integration-mxxzu1_spire-server-b_1 ... done
Creating spire-integration-mxxzu1_spire-server-a_1 ... done
[2023-03-02T08:06:07Z] executing 02-bootstrap-federation-bundles...
[2023-03-02T08:06:07Z] bringing up spire-server-a spire-server-b...
Starting spire-integration-mxxzu1_spire-server-a_1 ... done
Starting spire-integration-mxxzu1_spire-server-b_1 ... done
[2023-03-02T08:06:08Z] bootstrapping bundle from server b to server a...
[2023-03-02T08:06:08Z] step 02-bootstrap-federation-bundles failed
[2023-03-02T08:06:08Z] executing teardown...
Attaching to spire-integration-mxxzu1_spire-server-a_1, spire-integration-mxxzu1_spire-server-b_1
spire-server-a_1  | time="2023-03-02T08:06:06Z" level=warning msg="The default_svid_ttl is too high for the configured ca_ttl value. SVIDs with shorter lifetimes may be issued. Please set the default_svid_ttl to 10m or less, or the ca_ttl to 6h or more, to guarantee the full default_svid_ttl lifetime when CA rotations are scheduled."
spire-server-a_1  | time="2023-03-02T08:06:06Z" level=error msg="Unknown configuration detected" keys="admin_ids,default_x509_svid_ttl" section=server
spire-server-a_1  | unknown configuration detected
spire-server-a_1  | unknown configuration detected
spire-server-a_1  | time="2023-03-02T08:06:08Z" level=warning msg="The default_svid_ttl is too high for the configured ca_ttl value. SVIDs with shorter lifetimes may be issued. Please set the default_svid_ttl to 10m or less, or the ca_ttl to 6h or more, to guarantee the full default_svid_ttl lifetime when CA rotations are scheduled."
spire-server-a_1  | time="2023-03-02T08:06:08Z" level=error msg="Unknown configuration detected" keys="admin_ids,default_x509_svid_ttl" section=server
spire-server-b_1  | unknown configuration detected
spire-server-b_1  | time="2023-03-02T08:06:06Z" level=warning msg="The default_svid_ttl is too high for the configured ca_ttl value. SVIDs with shorter lifetimes may be issued. Please set the default_svid_ttl to 10m or less, or the ca_ttl to 6h or more, to guarantee the full default_svid_ttl lifetime when CA rotations are scheduled."
spire-server-b_1  | time="2023-03-02T08:06:06Z" level=error msg="Unknown configuration detected" keys=default_x509_svid_ttl section=server
spire-server-b_1  | unknown configuration detected
spire-server-b_1  | time="2023-03-02T08:06:08Z" level=warning msg="The default_svid_ttl is too high for the configured ca_ttl value. SVIDs with shorter lifetimes may be issued. Please set the default_svid_ttl to 10m or less, or the ca_ttl to 6h or more, to guarantee the full default_svid_ttl lifetime when CA rotations are scheduled."
spire-server-b_1  | time="2023-03-02T08:06:08Z" level=error msg="Unknown configuration detected" keys=default_x509_svid_ttl section=server
[2023-03-02T08:06:09Z] bringing down services...
Removing spire-integration-mxxzu1_spire-server-a_1 ... done
Removing spire-integration-mxxzu1_spire-server-b_1 ... done
Removing network spire-integration-mxxzu1_default
[2023-03-02T08:06:10Z] cleaning up services...
Removing network spire-integration-mxxzu1_default
WARNING: Network spire-integration-mxxzu1_default not found.
[2023-03-02T08:06:11Z] "admin-endpoints" test suite failed.
STATUS=0
[2023-03-02T08:06:11Z] The following tests failed: admin-endpoints
make: *** [Makefile:307: integration] Error 1

I also tinkered with the fetch-x509-svids suite. In that one, I was able to get past the config error by changing the default_x509_svid_ttl to default_svid_ttl in its conf/server/server.conf. Then the test continued, but failed because the cache count didn't match. This leads me to believe the current code is not the system under test. I was not immediately able to find where in the test setup it could be reaching stale code.

azdagron commented 1 year ago

The flow changed back when arm64 images were introduced. Make images no longer loads the new images into the local docker registry. After running make images, you currently need to then run make load-images before running make integration.

zmt commented 1 year ago

After running make images, you currently need to then run make load-images before running make integration. I knew it had to be something simple that I was missing. Thank you. I did hit another undocumented dependency for the new image building/loading:


.github/workflows/scripts/load-oci-archives.sh

USAGE: load-oci-archives.sh

"load-oci-archives.sh" loads oci tarballs created with xbuild into docker.

Usage example(s): ./load-oci-archives.sh PLATFORM=linux/arm64 ./load-oci-archives.sh

Commands

I will pick an installation method: https://github.com/regclient/regclient/blob/main/docs/install.md

I think I have a few follow-ups here:

And a few more questions: Would it make sense to choose installation methods for users for buildx and/or regctl and automate in the Makefile? Would it make sense to wire the load-images target up to the integration target in the Makefile to smooth this rough edge?

Incidentally, I filed this: https://github.com/regclient/regclient/security/advisories/GHSA-2rq7-cqrj-xgrg

zmt commented 1 year ago

I confirmed re-running my trivial repro of the single suite works after downloading a regctl binary to support the load-images target:

make images load-images && SUITES="suites/admin-endpoints" make integration
[snip]
[2023-03-02T19:13:35Z] "admin-endpoints" test suite succeeded.
zmt commented 1 year ago

I confirmed whole suite works except for k8s but that is a known issue in my development environment.

make images load-images integration
zmt commented 1 year ago

The regctl team responded by pointing me at their docs regarding verifying signatures that I had missed: https://github.com/regclient/regclient/blob/main/docs/install.md#verifying-signatures

I muddled my way through with breadcrumbs from the low-level error output I was seeing with the exception of figuring out that I now need load-images target for running integration target. Perhaps that could/should be codified in the Makefile? I'm not sure of any counterpoint, but can imagine there might be some.

If I ever get around to it, I'll try to put up a PR that safely handles docker TLS env and context, but don't hold your breath.

azdagron commented 1 year ago

This was merged recently: https://github.com/spiffe/spire/pull/3940

azdagron commented 1 year ago

I'll close this for now. If you have follow up PRs suggesting documentation clarity, that would be appreciated :)