holos-run / holos

Holos - The Holistic platform manager
https://holos.run
Apache License 2.0
19 stars 0 forks source link

Add concurrency to 'holos render platform' #179

Closed natemccurdy closed 5 months ago

natemccurdy commented 5 months ago

This adds a new flag, --concurrency <int>, to holos render platform.

Default concurrency is set to min(runtime.NumCPU(), 8), which is the lesser of 8 or the number of CPU cores. In testing, I found that past 8, there are diminishing or negative returns due to memory usage of rendering each component.

In practice, this reduced rendering of the SAAS platform components from ~90s to ~23 on my 12-core MacBook Pro.

This run uses the default concurrency value of 8:

$ holos --version
0.83.1-8-g6f8008a

$ time holos render platform ./platform
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-routes cluster=aws1 num=22 total=61 duration=1.712768875s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/creds cluster=aws2 num=53 total=61 duration=1.714254779s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-database cluster=aws2 num=45 total=61 duration=1.727663505s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=apps/dev/holos/app cluster=aws1 num=30 total=61 duration=1.745359478s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/httpbin/backend cluster=aws1 num=15 total=61 duration=1.798497503s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/httpbin/routes cluster=aws2 num=41 total=61 duration=1.816231293s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-routes cluster=aws2 num=47 total=61 duration=1.46147802s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/iap/authpolicy cluster=aws2 num=49 total=61 duration=1.498774183s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=apps/dev/holos/app cluster=aws2 num=55 total=61 duration=1.511358964s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=apps/dev/holos/infra cluster=aws2 num=54 total=61 duration=1.498846086s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/iap/authproxy cluster=aws2 num=48 total=61 duration=1.642901232s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/cert-manager cluster=aws2 num=61 total=61 duration=3.471427601s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/gateway cluster=aws2 num=39 total=61 duration=1.756476123s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/namespaces cluster=management num=56 total=61 duration=1.636512035s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/cni cluster=aws2 num=37 total=61 duration=5.042338187s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/cert-manager cluster=management num=57 total=61 duration=1.786879225s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-server cluster=aws2 num=46 total=61 duration=4.341947895s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/istiod cluster=aws2 num=38 total=61 duration=3.115844697s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/iap/authproxy cluster=aws1 num=23 total=61 duration=1.564858393s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/iap/authpolicy cluster=aws1 num=24 total=61 duration=1.537298888s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/creds cluster=aws1 num=28 total=61 duration=1.638808696s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/httpbin/backend cluster=aws2 num=40 total=61 duration=1.681625333s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/namespaces cluster=aws1 num=58 total=61 duration=1.651958199s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/cd cluster=aws1 num=26 total=61 duration=4.876876064s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/cert-manager cluster=aws1 num=59 total=61 duration=1.819261767s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/cd cluster=aws2 num=51 total=61 duration=5.194567065s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/routes cluster=aws1 num=27 total=61 duration=1.797553618s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/eso-creds-refresher cluster=aws1 num=7 total=61 duration=1.974509045s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/ecr-creds-refresher cluster=aws1 num=9 total=61 duration=1.922980458s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/secretstores cluster=aws1 num=8 total=61 duration=2.039197402s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/routes cluster=aws2 num=52 total=61 duration=1.874079132s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-certs cluster=management num=4 total=61 duration=1.90254282s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/base cluster=aws1 num=11 total=61 duration=3.854436718s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/pgo/controller cluster=aws1 num=18 total=61 duration=2.004786198s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/httpbin/routes cluster=aws1 num=16 total=61 duration=1.964574935s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/eso-creds-manager cluster=management num=1 total=61 duration=2.223611466s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/ecr-creds-manager cluster=management num=5 total=61 duration=2.23655353s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/crds cluster=aws2 num=50 total=61 duration=8.749997308s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=apps/dev/holos/infra cluster=aws1 num=29 total=61 duration=2.067509036s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/cert-letsencrypt cluster=management num=2 total=61 duration=2.04915908s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/argo/crds cluster=aws1 num=25 total=61 duration=9.564710717s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/gateway-api cluster=aws1 num=10 total=61 duration=4.980955514s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/pgo/crds cluster=aws1 num=17 total=61 duration=2.64378083s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/namespaces cluster=aws2 num=60 total=61 duration=2.125991539s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/certificates cluster=management num=3 total=61 duration=2.181300293s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/istiod cluster=aws1 num=13 total=61 duration=2.201376937s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/external-secrets cluster=aws1 num=6 total=61 duration=4.499427331s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-secrets cluster=aws1 num=19 total=61 duration=2.00750522s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/cni cluster=aws1 num=12 total=61 duration=2.134342672s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/pgo/crds cluster=aws2 num=42 total=61 duration=2.503927026s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-server cluster=aws1 num=21 total=61 duration=2.408034914s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-secrets cluster=aws2 num=44 total=61 duration=2.145211858s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/pgo/controller cluster=aws2 num=43 total=61 duration=2.176492745s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/secretstores cluster=aws2 num=33 total=61 duration=2.276511557s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/mesh/gateway cluster=aws1 num=14 total=61 duration=2.297268823s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/external-secrets cluster=aws2 num=31 total=61 duration=2.399997801s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/eso-creds-refresher cluster=aws2 num=32 total=61 duration=2.330886365s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/istio/base cluster=aws2 num=36 total=61 duration=1.961614286s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/ecr-creds-refresher cluster=aws2 num=34 total=61 duration=1.808768939s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/login/zitadel-database cluster=aws1 num=20 total=61 duration=1.72295187s
3:07PM INF platform.go:43 ok render component version=0.83.1 path=components/gateway-api cluster=aws2 num=35 total=61 duration=4.424327058s
holos render platform ./platform  144.59s user 14.61s system 694% cpu 22.908 total
natemccurdy commented 5 months ago

Looks like there are issues with Helm files clobbering each other in concurrent renders.

$ git clean -fdx
Removing saas/components/argo/cd/vendor/
Removing saas/components/cert-manager/vendor/
Removing saas/components/crossplane/controller/vendor/
Removing saas/components/eks-pod-identity-webhook/vendor/
Removing saas/components/external-secrets/vendor/
Removing saas/components/istio/base/vendor/
Removing saas/components/istio/mesh/cni/vendor/
Removing saas/components/istio/mesh/gateway/vendor/
Removing saas/components/istio/mesh/istiod/vendor/
Removing saas/components/login/zitadel-server/vendor/

$ holos render platform ./platform
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/login/zitadel-certs cluster=management num=4 total=64 duration=1.828944088s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/istio/mesh/httpbin/routes cluster=aws1 num=19 total=64 duration=1.8991467s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/pgo/controller cluster=aws1 num=21 total=64 duration=2.18369136s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/argo/creds cluster=aws1 num=31 total=64 duration=1.130460137s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/argo/routes cluster=aws1 num=30 total=64 duration=1.089624699s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/istio/mesh/cni cluster=aws2 num=40 total=64 duration=3.965683892s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/cert-letsencrypt cluster=management num=2 total=64 duration=1.124516594s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/eso-creds-manager cluster=management num=1 total=64 duration=1.384222009s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/ecr-creds-manager cluster=management num=5 total=64 duration=1.242500902s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/login/zitadel-secrets cluster=aws1 num=22 total=64 duration=1.116147967s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/external-secrets cluster=aws2 num=34 total=64 duration=4.541646733s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/eks-pod-identity-webhook cluster=management num=6 total=64 duration=2.81272254s
10:41AM INF platform.go:45 ok render component version=0.83.1 path=components/istio/mesh/iap/authpolicy cluster=aws2 num=52 total=64 duration=1.296626354s
10:41AM ERR could not execute version=0.83.1 code=unknown err="could not rename: rename /Users/nate/src/holos-run/holos-infra/saas/components/external-secrets/vendor1071240550/external-secrets /Users/nate/src/holos-run/holos-infra/saas/components/external-secrets/vendor/external-secrets: file exists" loc=helm.go:159
10:41AM ERR could not execute version=0.83.1 code=unknown err="could not render component: exit status 1" loc=platform.go:40
natemccurdy commented 5 months ago

From Slack:

So, I think we:

  1. Document the only thing that should ever write to the vendor directory is the cacheChart method.
  2. Document our assumption direct sub-directories are moved into place atomically.
  3. Handle the error by logging it at debug level instead of returning an error and then continue with that for loop.

The temp directory is already cleaned up so should be pretty minimal change. Might also be worth a comment this is the reason I put the temp directory in the same directory as the destination. (edited) That way it's guaranteed to be on the same filesystem, renames aren't atomic across filesystems like from /tmp to /home