Upon deploying WC agents, detect if self-hosted flux is used and re-use existing config

dholbach commented 6 years ago

If people have tried out flux before and decided to take the plunge and use Weave Cloud, it'd be great if their existing setup could be incorporated, so they wouldn't have to set things up again or worse, their existing CICD pipeline not working anymore.

dlespiau commented 6 years ago

We have a flux migration path already (https://github.com/weaveworks/launcher/blob/master/agent/flux.go). It hasn't been tested with the self-hosted version of flux though.

rade commented 6 years ago

See also weaveworks/flux#1085

dholbach commented 6 years ago

I tested this in a couple of simple test-cases (using minikube, kubeadm, different branch setups), here's a minimal test-case which shows that the git repository is not auto-detected:

# create fork of github/weaveworks/flux-example in github

git clone git@github.com:weaveworks/flux.git
cd flux
$EDITOR deploy/flux-deployment.yaml
# change --git-url in L63 to your own fork of flux-example
kubectl apply -f deploy/

export FLUX_POD=$(kubectl get pods --namespace default -l "name=flux" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward -n default $FLUX_POD 10080:3030 &
export FLUX_URL="http://localhost:10080/api/flux"

fluxctl identity
# add ssh key to https://github.com/dholbach/flux-example/settings/keys/new (rw)

# edit L17 https://github.com/dholbach/flux-example/blob/master/helloworld-deploy.yaml and commit

kubectl -n default logs deployment/flux -f
# patiently wait for deploy to happen

kubectl port-forward deployment/helloworld 8080:80 &
curl localhost:8080

# confirm you're seeing updated msg

# WC
# head to https://cloud.weave.works/instances
# connect cluster
# wait -> view your cluster
# head to deploy -> git repo is not pre-configured

Please let me know, if I should have ventured a different path in testing or if you can think of a more straight-forward or more accurate test-case.

dholbach commented 6 years ago

@aaron7 mentioned https://github.com/weaveworks/launcher/blob/master/integration-tests/tests/flux-config.sh as a test-case that's run to test the transfer of config. It installs the flux agent by itself with some config, then installs WC using the installer and finally checks the config transferred across.

dholbach commented 6 years ago

I just set set up my test case again and this is what kubectl describe returns for flux-agent:

    Args:
      --token=$(WEAVE_CLOUD_TOKEN)
      --connect=wss://cloud.weave.works./api/flux
      --memcached-hostname=weave-flux-memcached.weave.svc.cluster.local
      --git-ci-skip=true
      --ssh-keygen-dir=/var/fluxd/ssh

aaron7 commented 6 years ago

Having a quick look, we only migrate the existing flux agent if it lives in the kube-system namespace.

In @dholbach's example, the existing flux agent is in the default namespace. Perhaps we could search all namespaces for existing flux agents?

dholbach commented 6 years ago

The default flux deploy config seems to use default as namespace, which the migration bits don't take a look at:

[daniel@reef launcher (⎈ |minikube:default)]$ git grep -i getfluxconfig
agent/flux.go:func getFluxConfig(k kubectl.Client, namespace string) (*FluxConfig, error) {
agent/main.go:  fluxCfg, err := getFluxConfig(cfg.KubectlClient, "weave")
agent/migrate.go:       fluxCfg, err := getFluxConfig(kubectlClient, "kube-system")
[daniel@reef launcher (⎈ |minikube:default)]$

dholbach commented 6 years ago

Same after initially deploying flux in the kube-system namespace.

I'll try out if deploying it in the weave namespaces changes anything after lunch.

dholbach commented 6 years ago

Same with flux deployed in the weave namespace. Logs from weave-flux-agent:

ts=2018-06-07T13:20:17.769202579Z caller=main.go:216 component=cluster host=https://10.96.0.1:443 version=kubernetes-v1.10.0
ts=2018-06-07T13:20:17.769256504Z caller=main.go:228 component=cluster kubectl=/usr/local/bin/kubectl
ts=2018-06-07T13:20:17.769920564Z caller=main.go:236 component=cluster ping=true
ts=2018-06-07T13:20:17.770988579Z caller=memcached.go:69 component=memcached err="Error setting memcache servers to 'weave-flux-memcached.weave.svc.cluster.local': lookup _memcached._tcp.weave-flux-memcached.weave.svc.cluster.local on 10.96.0.10:53: no such host"
ts=2018-06-07T13:20:17.771078393Z caller=main.go:362 url= user="Weave Flux" email=support@weave.works sync-tag=flux-sync notes-ref=flux set-author=false
ts=2018-06-07T13:20:17.771187174Z caller=main.go:390 component=upstream URL=wss://cloud.weave.works./api/flux
ts=2018-06-07T13:20:17.771760629Z caller=upstream.go:128 component=upstream connecting=true
ts=2018-06-07T13:20:17.771912807Z caller=main.go:427 addr=:3030
ts=2018-06-07T13:20:17.77254626Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-06-07T13:20:17.772636235Z caller=images.go:23 component=sync-loop error="getting unlocked automated services: git repo does not have valid config"
ts=2018-06-07T13:20:17.772765838Z caller=loop.go:89 component=sync-loop err="git repo does not have valid config"
ts=2018-06-07T13:20:18.245617273Z caller=upstream.go:142 component=upstream connected=true
ts=2018-06-07T13:20:18.31328609Z caller=memcached.go:115 component=memcached err="Fetching tag from memcache: memcache: no servers configured or available"

aaron7 commented 6 years ago

Ah, the name label is probably not matching either.

We look for weave-flux-agent https://github.com/weaveworks/launcher/blob/master/agent/flux.go#L19 whereas in the flux example we don't use this label.

We need to figure out a way of finding existing flux agents in the cluster without this knowledge.

rade commented 6 years ago

We can change the example. We don't have to worry too much about old flux installations.

aaron7 commented 6 years ago

We can change the example. We don't have to worry too much about old flux installations.

It would be nice to detect a flux agent running anywhere - e.g. based on if a container has the combination of flux arguments

but yep, updating the example to add the label and perhaps run it in the weave namespace would be a good step

dholbach commented 6 years ago

@rade So you'd prefer us changing the label and namespace in flux's deploy/ directory? If yes, I'll file an issue.

rade commented 6 years ago

That seems the simplest option, but don't feel obliged to listen to me.

dholbach commented 6 years ago

Confirming that the following diff fixes git config discovery:

diff --git a/deploy/flux-account.yaml b/deploy/flux-account.yaml
index 4bf7702..d9d51c9 100644
--- a/deploy/flux-account.yaml
+++ b/deploy/flux-account.yaml
@@ -34,4 +34,4 @@ roleRef:
 subjects:
   - kind: ServiceAccount
     name: flux
-    namespace: default
+    namespace: kube-system
diff --git a/deploy/flux-deployment.yaml b/deploy/flux-deployment.yaml
index cf5b1cb..a6a76e6 100644
--- a/deploy/flux-deployment.yaml
+++ b/deploy/flux-deployment.yaml
@@ -10,7 +10,7 @@ spec:
   template:
     metadata:
       labels:
-        name: flux
+        name: weave-flux-agent
     spec:
       serviceAccount: flux
       volumes:

The instructions in my comment above would need to change as well.

@rade @aaron7 Does this look all right to you? If yes, I'll file a PR on flux and make sure the docs are updated as well.

rade commented 6 years ago

Shouldn't it be weave instead of kube-system?

I rather suspect @squaremo will have an opinion about all this.

dholbach commented 6 years ago

Happy to run another test using weave. ~~The change in flux-account.yaml might not be necessary there.~~

squaremo commented 6 years ago

The default flux deploy config seems to use default as namespace

Nope; it doesn't specify a namespace, which is different.

It's a bit odd putting it in a 'weave' namespace (along with all the other bits, and supplying the namespace manifest as well), but I don't care that much. It's not terrible if it's run in its own namespace, people can always adapt it if they want it elsewhere. And it's probably better than using kube-system, which was a misstep.

I do object to naming it weave-flux-agent. That's specific to weave cloud, and in my humble and honest opinion, a bit of a daft name. Even if you do rename it, you'd have to account for people putting it in a different namespace, with a different name, and so on -- I don't think it buys a lot of convenience.

rade commented 6 years ago

A 80% solution is just fine here. We absolutely do not need to detect "any flux anywhere".

squaremo commented 6 years ago

you'd have to account for people putting it in a different namespace, with a different name, and so on

Where "so on" could stand in for:

running more than one daemon, in different namespaces, with different permissions (not sure if anyone does this, but it's possible)
running under a different service account
running their own "upstream" server and pointing --connect at it (at least one person does this)

I wouldn't expect any automated transition to deal with all of these, but it's worth being aware of them, at least to know when not to attempt a transition.

I wouldn't object to putting a reliably well-identifying set of labels on the example deployment, so it can be picked out for transition (and you can see if there's more than one, etc.).

dholbach commented 6 years ago

Can we agree on a set of labels and namespace, so they can be applied in both flux (and its docs) and launcher and we catch at least the example deployments?

In a later stage we could review how we could more reliably find differing setups.

dholbach commented 6 years ago

Any opinions?

dholbach commented 6 years ago

@squaremo mentioned on Slack that

he'd err on the side of least surprise, and shifting someone's flux daemon around would be pretty surprising
we should warn people about possibly having two instances of Flux (installing WC after having Flux standalone in the cluster already) That it's possible they wanted it that way, so we shouldn't really stomp on their existing config without giving them the option of not doing so.

squaremo commented 6 years ago

Am I right to suppose that making the agent installation interactive is a fair bit of work?

squaremo commented 6 years ago

All else being equal, I'd do something like

try to detect if any fluxes, scopes, and whatnot are running;
ask for each if you want them to be migrated to Weave Cloud (meaning different things in each case)

But if the interactivity is not worth the effort, consider at least trying to detect existing installations, and balking unless a --migrate flag is given (or until the existing thing is removed).

And if that's not worth the effort, consider ignoring me :-)

weaveworks / launcher

Upon deploying WC agents, detect if self-hosted flux is used and re-use existing config #168