mxschmitt / ui-driver-hetzner

Rancher UI driver for the Hetzner Cloud docker driver.
https://mxschmitt.github.io/ui-driver-hetzner
Apache License 2.0
254 stars 50 forks source link

Also add labels to Hetzner Server from NodeTemplate #109

Closed NotANormalNerd closed 3 years ago

NotANormalNerd commented 3 years ago

Add the Labels defined by the Node Templates to Hetzner servers, which add the possibilities to use label selectors for load Balancers as requested in #100

NotANormalNerd commented 3 years ago

The setLabels gets called by the form-user-labels. here https://github.com/mxschmitt/ui-driver-hetzner/blob/1d338fce46ba4edc33bbb61e4ae5459fbd1b6cda/component/template.hbs#L100 We also have to call super on the NodeDriver to set the labels in rancher/k8s as well.

I had to basically set rancher to TRACE since there is no documentation on the custom-node-drivers whatsoever. Took me way to long to find all source code and figure stuff out.

mxschmitt commented 3 years ago

The setLabels gets called by the form-user-labels. here

https://github.com/mxschmitt/ui-driver-hetzner/blob/1d338fce46ba4edc33bbb61e4ae5459fbd1b6cda/component/template.hbs#L100

We also have to call super on the NodeDriver to set the labels in rancher/k8s as well. I had to basically set rancher to TRACE since there is no documentation on the custom-node-drivers whatsoever. Took me way to long to find all source code and figure stuff out.

thanks for the explanation, merged!

KochMario commented 3 years ago

Hi there, I just tried including the server label option in my setup as well. It appears however, that this option is not mirrored yet in the UI itself? Im using the docker-machine-driver-hetzner with Rancher and the following is my current Node Driver Config.

Screenshot 2021-01-12 at 22 22 30

So I'm using the latest (3.2) version, which includes --hetzner-server-label flag, the UI itself doesn't seem to provide that option though. When trying to instantiate a new node with a template I receive the following error

Pasted_Image_12_01_21__22_25

According to your code @mxschmitt , it seems that you're setting all provided labels as serverLabels. Or am I mistaken about that part?

setLabels: function setLabels(labels) {
    var labels_list = labels.map(function (l) {
      return l.key + "=" + l.value;
    });
    this.set('model.hetznerConfig.serverLabel', labels_list);

    this._super(labels);
  }

I tried setting a label via the labels section in the Node Template GUI. Am I missing something?

NotANormalNerd commented 3 years ago

No you are right, we use all k8s labels we set for the nodes and set them on the virtual servers at Hetzner. It seems that the set Labels function is called even without sepcifying any labels.

In my tests there always was one so I haven't tested it without any labels, my bad.

In the meantime, you can just specify any label and the setup should work fine. @KochMario Would you also mind showing us the node template you setup?

KochMario commented 3 years ago

I think I figured out the issue. I tried adding the newly created Node Template (with Hetzner Server Labels) to an already existing Cluster (which was setup using an old hetzner node driver). This supposedly wasn't compatible. The cluster is actually broken now and I can't even create any new nodes with the old Node Templates.

Creating an entirely new cluster from scratch with the newly added Node Templates does work though. The provided labels are being correctly applied to the created Hetzner Servers.

It's definitely kind of a bummer that the version upgrade effectively corrupted the cluster, but since we're still in trial & setup mode it's not that big of a deal for us. Hetzner Server Labels seem to work, so I'm happy :) Thanks for your work in enabling that! @NotANormalNerd

mxschmitt commented 3 years ago

is it possible to make it not a breaking change? maybe pass empty ones by default @NotANormalNerd

NotANormalNerd commented 3 years ago

I am really sorry for that. Maybe we can have versioned UIs? So that we don't break existing installations? Yes. We can do something. My javascript-fu is not very low. So we could just check if the labels_list is empty, and if it is we can ommit and delete the this.set('model.hetznerConfig.serverLabel', labels_list); @mxschmitt

@KochMario Again I am sorry, should have checked that. But the old templates should work without problems. The driver is called with the HetznerConfig that is saved in Rancher, which should not have serverLabels saved. If you saved the old ones without labels this should at least have the rancher labels creator=norman saved. If you removed the default labels, then none are saved. You can get around that by just adding a label.

ssimeth commented 3 years ago

I think I figured out the issue. I tried adding the newly created Node Template (with Hetzner Server Labels) to an already existing Cluster (which was setup using an old hetzner node driver). This supposedly wasn't compatible. The cluster is actually broken now and I can't even create any new nodes with the old Node Templates.

Creating an entirely new cluster from scratch with the newly added Node Templates does work though. The provided labels are being correctly applied to the created Hetzner Servers.

It's definitely kind of a bummer that the version upgrade effectively corrupted the cluster, but since we're still in trial & setup mode it's not that big of a deal for us. Hetzner Server Labels seem to work, so I'm happy :) Thanks for your work in enabling that! @NotANormalNerd

I have the same problem, but in a production setup. Do I have any chance to solve the problem? It appears that the existing cluster is using the old driver and only new clusters use the upgraded driver.

KochMario commented 3 years ago

I think I figured out the issue. I tried adding the newly created Node Template (with Hetzner Server Labels) to an already existing Cluster (which was setup using an old hetzner node driver). This supposedly wasn't compatible. The cluster is actually broken now and I can't even create any new nodes with the old Node Templates. Creating an entirely new cluster from scratch with the newly added Node Templates does work though. The provided labels are being correctly applied to the created Hetzner Servers. It's definitely kind of a bummer that the version upgrade effectively corrupted the cluster, but since we're still in trial & setup mode it's not that big of a deal for us. Hetzner Server Labels seem to work, so I'm happy :) Thanks for your work in enabling that! @NotANormalNerd

I have the same problem, but in a production setup. Do I have any chance to solve the problem? It appears that the existing cluster is using the old driver and only new clusters use the upgraded driver.

I'm afraid I don't know of any other solution than setting up a fresh cluster. Maybe you could try duplicating your existing cluster and only once everything is up and running switching the DNS records (so you have limited / no downtime)?

NotANormalNerd commented 3 years ago

@notitiatech @KochMario A solution for you.

You can clone the repository, checkout the commit before we changed stuff, follow the https://github.com/mxschmitt/ui-driver-hetzner#building and deploy that 'dist' to a webserver of your choice that is reachable by your rancher instance. Then exchange the "Custom UI URL" with your own and voila, you have another version of the UI, that also does not change as long as you don't update the UI.

On the other hand I don't understand what the problem is with upgrading the driver, also we already fixed that problem as far as I know. So you probably have another problem altogether.

Best regards, Dennis

mxschmitt commented 3 years ago

lets maybe revert that change so the customers can continue to use this ui driver? @NotANormalNerd

NotANormalNerd commented 3 years ago

@mxschmitt That probably would break our setup somehow, we could at least not update any node templates in the forseeable future since we rely on the hetzner labels for load balancing.

You would also have to roll forward and empty or the delete the Config.serverLabel setting, since that is saved by rancher and will be still applied. So another code change for that. I don't support that. Yes that setup got broken, but in my opinion, the solution is to update the hetzner Node driver instead of reverting changes here.

I also haven't seen any "debug" of what actually broke, just This supposedly wasn't compatible... and It appears that.... @KochMario @notitiatech What Version of the Hetzner Node Driver are you running? What Version of Rancher? Replacing the Download URL with https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/download/3.2.0/docker-machine-driver-hetzner_3.2.0_linux_amd64.tar.gz and resaving the node templates should fix the setup and not actually have any other side effects.

Don't get me wrong, I actually believe this could have something to do with the UI driver, but I don't see a reason why the UI driver should roll anywhere.

Also It appears that the existing cluster is using the old driver and only new clusters use the upgraded driver. is not correct in my experience as I upgraded our staging and production cluster that way.

All in all it would be a good idea to version the releases here, so situations like this can be avoided.

It comes down to Linus Torvalds "Don't break userspace" vs Zuckerbergs "Move fast and break things"

ssimeth commented 3 years ago

@NotANormalNerd I replaced the URL of the driver with the new one https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/download/3.2.0/docker-machine-driver-hetzner_3.2.0_linux_amd64.tar.gz, resaved the node template and ask Rancher to deploy a new worker node from this template. This ended up with the following error Flag provided but not defined: -hetzner-networks; Timeout waiting for ssh key. Deploying a new cluster with the template works fine.

Bildschirmfoto 2021-03-02 um 22 25 38 Bildschirmfoto 2021-03-02 um 22 26 38

Rancher Version: 2.5.1 Kubernetes Version: 1.18.8-rancher1-1 Docker Version: 19.3.15

NotANormalNerd commented 3 years ago

@notitiatech Can you go into your rancher installation and activate debug logging? https://rancher.com/docs/rancher/v2.x/en/troubleshooting/logging/

Then start a new node and see what the debug log is? Specifically it should be something along rancher-machine or deployment.

Also on your text the error is -hetzner-networks in the image it is -hetzner-labels. I can't really be that it uses the "old driver" for old cluster and the "new driver" for new clusters, since the node drivers are global settings, as well as the node templates. So somewhere in your rancher installation there is a state mismatch of all of that.

ssimeth commented 3 years ago

@NotANormalNerd the error message differs from deployment to deployment.

Rancher logs:

2021/03/03 15:50:59 [INFO] Creating jail for c-pm7c9 2021/03/03 15:50:59 [INFO] Provisioning node nt-prod-big3 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Incorrect Usage. 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Usage: docker-machine create [OPTIONS] [arg...] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Create a machine 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Description: 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Run 'rancher-machine create --driver name --help' to include the create flags for that driver in the help text. 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] Options: 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine]
2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --driver, -d "virtualbox" Driver to create machine with. [$MACHINE_DRIVER] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-env [--engine-env option --engine-env option] Specify environment variables to set in the engine 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-insecure-registry [--engine-insecure-registry option --engine-insecure-registry option] Specify insecure registries to allow with the created engine 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-install-url "https://get.docker.com" Custom URL to use for engine installation [$MACHINE_DOCKER_INSTALL_URL] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-label [--engine-label option --engine-label option] Specify labels for the created engine 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-opt [--engine-opt option --engine-opt option] Specify arbitrary flags to include with the created engine in the form flag=value 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-registry-mirror [--engine-registry-mirror option --engine-registry-mirror option] Specify registry mirrors to use [$ENGINE_REGISTRY_MIRROR] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --engine-storage-driver Specify a storage driver to use with the engine 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-api-token Project-specific Hetzner API token [$HETZNER_API_TOKEN] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-existing-key-id "0" Existing key ID to use for server; requires --hetzner-existing-key-path [$HETZNER_EXISTING_KEY_ID] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-existing-key-path Path to existing key (new public key will be created unless --hetzner-existing-key-id is specified) [$HETZNER_EXISTING_KEY_PATH] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-image "ubuntu-16.04" Image to use for server creation [$HETZNER_IMAGE] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-image-id "0" Image to use for server creation [$HETZNER_IMAGE_ID] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-server-location Location to create machine at [$HETZNER_LOCATION] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-server-type "cx11" Server type to create [$HETZNER_TYPE] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --hetzner-user-data Cloud-init based User data [$HETZNER_USER_DATA] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm Configure Machine to join a Swarm cluster 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-addr addr to advertise for Swarm (default: detect and use the machine IP) 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-discovery Discovery service to use with Swarm 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-experimental Enable Swarm experimental features 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-host "tcp://0.0.0.0:3376" ip/socket to listen on for Swarm master 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-image "swarm:latest" Specify Docker image to use for Swarm [$MACHINE_SWARM_IMAGE] 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-join-opt [--swarm-join-opt option --swarm-join-opt option] Define arbitrary flags for Swarm join 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-master Configure Machine to be a Swarm master 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-opt [--swarm-opt option --swarm-opt option] Define arbitrary flags for Swarm master 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --swarm-strategy "spread" Define a default scheduling strategy for Swarm 2021/03/03 15:50:59 [INFO] [node-controller-rancher-machine] --tls-san [--tls-san option --tls-san option] Support extra SANs for TLS certs 2021/03/03 15:50:59 [INFO] Generating and uploading node config nt-prod-big3 W0303 15:51:12.631588 7 warnings.go:67] v1 ComponentStatus is deprecated in v1.19+ 2021/03/03 15:51:14 [ERROR] error syncing 'c-pm7c9/m-64lwh': handler node-controller: flag provided but not defined: -hetzner-use-private-network, requeuing W0303 15:51:24.863422 7 warnings.go:67] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress W0303 15:51:27.633421 7 warnings.go:67] v1 ComponentStatus is deprecated in v1.19+ W0303 15:51:42.658852 7 warnings.go:67] v1 ComponentStatus is deprecated in v1.19+ W0303 15:51:57.622866 7 warnings.go:67] v1 ComponentStatus is deprecated in v1.19+ W0303 15:52:12.633030 7 warnings.go:67] v1 ComponentStatus is deprecated in v1.19+

NotANormalNerd commented 3 years ago

flag provided but not defined: -hetzner-use-private-network, requeuing

It seems you are still running the 2.0.0 release of the Hetzner Driver, since it does not provide the -hetzner-use-private-network: https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/tag/2.1.0

When you change the Download URL and save the node driver, does it show in the UI as downloading the driver?

ssimeth commented 3 years ago

@NotANormalNerd this is exactly what I thought/said.

When you change the Download URL and save the node driver, does it show in the UI as downloading the driver?

yes

As I said deploying a new Cluster works as expected.

NotANormalNerd commented 3 years ago

Okay, there is something I can't wrap my head around and this is clearly off-topic here, as this is not the fault of the UI Component here:

You tell me: You have one rancher instance, which uses different versions of a node driver on different clusters created in the same rancher instance? Even tough you only have one hetzner node driver and this is a global rancher object?

ssimeth commented 3 years ago

@NotANormalNerd yes it probably looks like this. I have one Rancher instance with one Hetzner-driver as a global object. Using it in the existing cluster failed with the mentioned errors. Using it with a new cluster works.

2021-03-05 09_34_07

NotANormalNerd commented 3 years ago

Well I guess you probably should open an issue at https://github.com/rancher/rancher or go ask on their slack. Because this is clearly not a problem with this UI Component.

I also advise you to update your rancher instance to 2.5.5 just to make sure, that's not a bug already fixed.