uselagoon / lagoon-charts

A collection of Helm charts for Lagoon and associated services.
Apache License 2.0
11 stars 10 forks source link

Missing environment variables in lagoon-core #97

Open smlx opened 3 years ago

smlx commented 3 years ago

88 and #89 need to be reverted since these variables are actually needed.

88 causes this error in the test suite:

+++ docker tag ci-features-control-k8s-lagoon-type-override-node registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest
+++ echo 'docker push registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest'
++ '[' -f /kubectl-build-deploy/lagoon/push ']'
++ parallel --retries 4
The push refers to repository [registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node]
Get https://registry.lagoon.svc:5000/v2/: dial tcp: lookup registry.lagoon.svc on 10.96.0.10:53: no such host

89 causes this error in the test suite:

TASK [LAGOON TYPE OVERRIDE CONTROL-K8S - POST api deployEnvironmentBranch with target git branch lagoon-type-override and project ci-features-control-k8s (no sha) to http://lagoon-core-api:80/graphql] ***
Tuesday 29 September 2020  07:07:10 +0000 (0:00:06.310)       0:00:16.877 ***** 
ok: [localhost] => {
    "msg": "api response: {u'data': {u'deployEnvironmentBranch': u'Error: request to http://api:3000/graphql failed, reason: getaddrinfo ENOTFOUND api api:3000'}}"
}

In general I've been pretty careful about only adding the environment variables which are strictly required so any variable removal from the charts will generally require a corresponding change in Lagoon itself. If anything there will be environment variables missing from the charts :smile:

For these specific variables (REGISTRY, API_HOST), they are accessed in the common node packages here: https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/api.ts#L76 https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/tasks.ts#L92

And then imported by various services like this: https://github.com/amazeeio/lagoon/blob/main/services/api/src/gitlab-sync/groups.ts#L3

Schnitzel commented 3 years ago

yes, yes sorry API_HOST was my mistake, it's super strange though that the API itself needs to know the URL of the api sevice, that doesn't really make sense and is for me a bug of lagoon, as the api already has access to the sql db and should not make api calls to itself (probably need to handle this on lagoon itself)

created https://github.com/amazeeio/lagoon/issues/2233 for this.

for the registry though I think we need to something else, the env variable is ONLY needed during CI, as we don't start a full fledged harbor during ci (we probably should?). So in CI we rely on a non-authenticated docker registry where the docker images can be pushed, this is defined in https://github.com/amazeeio/lagoon/blob/main/docker-compose.yaml#L277 In any actual deployed lagoon-core though we would probably never have an unauthenticated docker registry, but instead we use Harbor. Lagoon will talk to harbor and create harbor-projects for each lagoon project and then set INTERNAL_REGISTRY_URL env variables which then will overwrite whatever is set in REGISTRY here: https://github.com/amazeeio/lagoon/blob/main/images/kubectl-build-deploy-dind/build-deploy.sh#L62

so it's quite confusing if we have a registry value in the values.yaml but we don't expect anybody to ever set it. Also the fact that we have it as required here: https://github.com/uselagoon/lagoon-charts/pull/98/files#diff-9af473e46b081d897ebfd5bd1684953bR115 is technically not correct, as you can run a full fledged lagoon without this variable.

so I'm not really sure what we want to do? My idea: we could also start a harbor during ci and rely on harbor to be used instead of the non-authenticated registry and then we don't need this env variable at all. This would also mean though that the ci run probably takes a bit longer, plus also for local development people are needed to run a harbor as well. so maybe we do this:

also one other question @smlx where did you see the test suite errors? the PRs for my changes came back green so I assumed that the test system made sure that it's safe to remove these env variables?

smlx commented 3 years ago

for the registry though I think we need to something else, the env variable is ONLY needed during CI, as we don't start a full fledged harbor during ci (we probably should?). So in CI we rely on a non-authenticated docker registry where the docker images can be pushed, this is defined in https://github.com/amazeeio/lagoon/blob/main/docker-compose.yaml#L277 In any actual deployed lagoon-core though we would probably never have an unauthenticated docker registry, but instead we use Harbor. Lagoon will talk to harbor and create harbor-projects for each lagoon project and then set INTERNAL_REGISTRY_URL env variables which then will overwrite whatever is set in REGISTRY here: https://github.com/amazeeio/lagoon/blob/main/images/kubectl-build-deploy-dind/build-deploy.sh#L62

I was confused about this.

The REGISTRY env variable is accessed here: https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/tasks.ts#L92

The comment above that line indicates that it is used in builddeploydata, and it is injected here: https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/tasks.ts#L490

So does that mean that in production builddeploydata contains a garbage registry values that is overridden in the build-deploy scripts every time? If so, should we change the way this works in Lagoon?

we could also start a harbor during ci and rely on harbor to be used instead of the non-authenticated registry and then we don't need this env variable at all. This would also mean though that the ci run probably takes a bit longer, plus also for local development people are needed to run a harbor as well.

:smile: I was doing this initially, but couldn't get harbor running properly in kind. I think I know what I did wrong though so will create an issue to replace the unauthenticated registry with harbor for CI.

so maybe we do this:

* we remove `registry` from the charts again and during the CI testing system we start a harbor via helmcharts and let lagoon work with it (with that we test during the CI all the stuff we actually need to run, plus it would probably have prevented this issue: [amazeeio/lagoon#2231](https://github.com/amazeeio/lagoon/issues/2231) with harbor compatibility as well)

Yep, sounds good.

also one other question @smlx where did you see the test suite errors? the PRs for my changes came back green so I assumed that the test system made sure that it's safe to remove these env variables?

I only got the lagoon test suite green for the first time the friday night before I went on holiday so didn't have time to integrate it with the CI on the repo. I just ran it locally to see those errors. Getting it integrated with CI is https://github.com/uselagoon/lagoon-charts/pull/82.

Schnitzel commented 3 years ago

So does that mean that in production builddeploydata contains a garbage registry values that is overridden in the build-deploy scripts every time? If so, should we change the way this works in Lagoon?

yea, pretty much. the fact that it works that way has to do with the history of lagoon, in the beginning we had to inject another registry (the first openshift we used didn't have a registry), then we started to use the openshift registry provided (the variable didn't really make sense at that point) and now with k8s we "could" use it with a lagoon-wide registry but right now we use the lagoon wide registry only for CI and local dev purposes. So Yes, I agree we should change this to make it more obvious for what this is used and what not. Also the fact that we have to inject the registry into services like webhooks2tasks is super weird, as they really should not know care about this at all (probably better the deployment controller that knows which registry should be used by default if there is none provided by harbor). Created a ticket for this here: https://github.com/amazeeio/lagoon/issues/2238