oVirt / terraform-provider-ovirt

Terraform provider for oVirt 4.x
https://registry.terraform.io/providers/oVirt/ovirt/latest/docs
Other
137 stars 72 forks source link

initialization_custom_script should be better documented #453

Open hotspoons opened 2 years ago

hotspoons commented 2 years ago

Because of a 3 layered matryoshka doll of encoding issues, it took me several hours to figure out how to get initialization_custom_script to post to oVirt's API without becoming malformed in transit, or failing all together.

There are a few problems here:

I'm not sure exactly where the fix for the xml encoding issue should go (either here or the go-ovirt-client - probably the client because anything else using that library would also benefit), but that was an insane snipe hunt to figure how to get the script to oVirt cleanly

ghost commented 2 years ago

Hey @hotspoons I believe this is less of a documentation issue, this seems to be a problem in the underlying go-ovirt itself. Unfortunately, we didn't have time to look at it yet, tracking this down is a bit tricky.

hotspoons commented 2 years ago

Hey @hotspoons I believe this is less of a documentation issue, this seems to be a problem in the underlying go-ovirt itself. Unfortunately, we didn't have time to look at it yet, tracking this down is a bit tricky.

Okay thanks @janosdebugs , I opened an issue in the go-ovirt-client GitHub and was embarrassed when I saw your name auto-assigned just as it was here, but there is probably more concrete usable info in that issue as far as a failing test case. I'm not great with go, otherwise I'd volunteer a pull request with a passing unit test off the cuff. But at the core it's just running XML encode on the field in question (though this issue may apply to other fields) before sending the request.

ghost commented 2 years ago

No worries at all @hotspoons, and sorry for the late reply. Typically, I reply within a week. The two issues are more than ok, this affects both projects, after all.

If you want to help, you could try and prove with mitmproxy that the encoding is indeed transported incorrectly. If you can give us a data dump from the request that goes to the engine, that would help.

Here's what we have on the mitmproxy setup in our internal docs:


mitmproxy is a very useful tool for debugging requests that are going to the engine.

In order to debug requests to the oVirt engine you need to perform 3 steps:

  1. Set up a hosts file entry to point the engine domain to 127.0.0.1.
  2. Set up the Terraform provider to connect with insecure=true.

You can then start mitmproxy, replacing the reverse target with your engine domain:

mitmproxy \
    --listen-host 127.0.0.1 \
    --listen-port 443 \
    --ssl-insecure \
    --mode reverse:https://ip-of-the-real-engine \
    --set keep_host_header=true
hotspoons commented 2 years ago

No worries at all @hotspoons, and sorry for the late reply. Typically, I reply within a week. The two issues are more than ok, this affects both projects, after all.

If you want to help, you could try and prove with mitmproxy that the encoding is indeed transported incorrectly. If you can give us a data dump from the request that goes to the engine, that would help.

Here's what we have on the mitmproxy setup in our internal docs:

mitmproxy is a very useful tool for debugging requests that are going to the engine.

In order to debug requests to the oVirt engine you need to perform 3 steps:

  1. Set up a hosts file entry to point the engine domain to 127.0.0.1.
  2. Set up the Terraform provider to connect with insecure=true.

You can then start mitmproxy, replacing the reverse target with your engine domain:

mitmproxy \
    --listen-host 127.0.0.1 \
    --listen-port 443 \
    --ssl-insecure \
    --mode reverse:https://ip-of-the-real-engine \
    --set keep_host_header=true

Okay I used mitmproxy to capture the traffic, and this is what came out (certificates, keys and tokens omitted):

15842:4:type;4:http;7:version;2:17#9:websocket;0:~8:response;746:6:reason;11:Bad Request,11:status_code;3:400#13:timestamp_end;18:1659295819.6783395^15:timestamp_start;18:1659295819.6759646^8:trailers;0:~7:content;235:<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>For correct usage, see: https://127.0.0.1/ovirt-engine/apidoc#services/vms/methods/add</detail>
    <reason>Request syntactically incorrect.</reason>
</fault>
,7:headers;315:40:4:Date,29:Sun, 31 Jul 2022 19:30:19 GMT,]98:6:Server,85:Apache/2.4.37 (centos) OpenSSL/1.1.1k mod_auth_gssapi/1.6.1 mod_wsgi/4.6.4 Python/3.6,]49:12:Content-Type,29:application/xml;charset=UTF-8,]24:14:Content-Length,3:235,]58:14:Correlation-Id,36:bed8cd7c-7056-4446-a882-9574e55f6613,]22:10:Connection,5:close,]]12:http_version;8:HTTP/1.1,}7:request;10645:4:path;21:/ovirt-engine/api/vms,9:authority;0:,6:scheme;5:https,6:method;4:POST,4:port;3:443#4:host;24:vm-manager.siomporas.com;13:timestamp_end;18:1659295819.6286306^15:timestamp_start;18:1659295819.6261492^8:trailers;0:~7:content;10044:<vm><cluster id="c0769f3c-9c03-11ec-bc0d-00163e448789"></cluster><cpu><topology><cores>4</cores><sockets>1</sockets><threads>2</threads></topology></cpu><disk_attachments></disk_attachments><initialization><custom_script>"runcmd":
- "#!/bin/bash"
- "echo &#39;password&#39; | passwd --stdin root"
- ""
- "## NFS Configuration - set NFS server and path for dynamic storage for persistent
  volumes"
- "NFS_SERVER=vm-host.siomporas.com"
- "NFS_PATH=/working/kubernetes-data"
- "NFS_PROVISION_NAME=siomporas.com/nfs"
- "## IP Address range for load balancer"
- "START_IP=192.168.1.220"
- "END_IP=192.168.1.225"
- "BASE_ARCH=x86_64"
- "AARCH=amd64"
- "EL_VERSION=8"
- "CONTAINERD_VERSION=1.6.6-3.1.el8"
- "HELM_VERSION=3.9.0"
- "METALLB_VERSION=0.13.3"
- ""
- "#Setup configuration"
- "DOCKER_REPO=https://download.docker.com/linux/centos/docker-ce.repo"
- "CONTAINER_IO_PKG=https://download.docker.com/linux/centos/$EL_VERSION/$BASE_ARCH/stable/Packages/containerd.io-$CONTAINERD_VERSION.$BASE_ARCH.rpm"
- "KUBERNETES_REPO=https://packages.cloud.google.com/yum/repos/kubernetes-el7-$BASE_ARCH"
- "KUBERNETES_GPG='https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg'"
- "HELM_URL=https://get.helm.sh"
- "HELM_FILE=helm-v$HELM_VERSION-linux-$AARCH.tar.gz"
- ""
- "#Kubernetes utilities setup for persistent volumes, dashboard, and metal load balancer"
- "DASHBOARD_URL=https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended.yaml"
- "NFS_CLIENT_PROVISIONER_CTNR=quay.io/external_storage/nfs-client-provisioner:latest"
- "METALLB_NAMESPACE_URL=https://raw.githubusercontent.com/metallb/metallb/v$METALLB_VERSION/manifests/namespace.yaml"
- "METALLB_URL=https://raw.githubusercontent.com/metallb/metallb/v$METALLB_VERSION/manifests/metallb.yaml"
- "ROCKY_MIGRATE_URL=https://raw.githubusercontent.com/rocky-linux/rocky-tools/main/migrate2rocky/migrate2rocky.sh"
- ""
- "mkdir /opt/tmp"
- "cd /opt/tmp"
- "curl -o /opt/tmp/migrate2rocky.sh $ROCKY_MIGRATE_URL"
- "chmod +x /opt/tmp/migrate2rocky.sh"
- "/opt/tmp/migrate2rocky.sh -r"
- ""
- "################################################"
- "## Configure EL8 for networking and tools     ##"
- "################################################"
- "dnf -y upgrade"
- "setenforce 0"
- "sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux"
- "modprobe br_netfilter"
- ""
- "dnf install -y wget git lsof firewalld bash-completion tc"
- "sed -i 's/FirewallBackend=nftables/FirewallBackend=iptables/g' /etc/firewalld/firewalld.conf"
- "systemctl restart firewalld"
- ""
- "firewall-cmd --add-masquerade --permanent"
- "firewall-cmd --reload"
- ""
- "cat <<EOF > /etc/sysctl.d/k8s.conf"
- "net.bridge.bridge-nf-call-ip6tables = 1"
- "net.bridge.bridge-nf-call-iptables = 1"
- "EOF"
- ""
- "sysctl --system"
- "swapoff -a"
- ""
- ""
- "################################################"
- "## Install Docker and Kubernetes              ##"
- "################################################"
- "dnf config-manager --add-repo=$DOCKER_REPO"
- "dnf install -y $CONTAINER_IO_PKG"
- "dnf install docker-ce --nobest -y"
- "sed -i 's/disabled_plugins = \\[\"cri\"\\]//g' /etc/containerd/config.toml"
- "systemctl start docker"
- "systemctl enable docker"
- ""
- "cat <<EOF > /etc/yum.repos.d/kubernetes.repo"
- "[kubernetes]"
- "name=Kubernetes"
- "baseurl=$KUBERNETES_REPO"
- "enabled=1"
- "gpgcheck=1"
- "repo_gpgcheck=1"
- "gpgkey=$KUBERNETES_GPG"
- "exclude=kube*"
- "EOF"
- ""
- "setenforce 0"
- "dnf upgrade -y"
- "dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes"
- "systemctl enable kubelet"
- "systemctl start kubelet"
- ""
- "################################################"
- "## Setup firewall rules                       ##"
- "################################################"
- ""
- "firewall-cmd --zone=public --permanent --add-port={6443,2379,2380,10250,10251,10252}/tcp"
- "firewall-cmd --zone=public --permanent --add-rich-rule 'rule family=ipv4 source
  address=worker-IP-address/32 accept'"
- "firewall-cmd --zone=public --permanent --add-rich-rule 'rule family=ipv4 source
  address=172.17.0.0/16 accept'"
- "firewall-cmd --reload"
- ""
- "################################################"
- "## Initialize cluster                         ##"
- "################################################"
- ""
- "kubeadm init --pod-network-cidr 192.168.0.0/16"
- "mkdir -p $HOME/.kube"
- "yes | cp /etc/kubernetes/admin.conf $HOME/.kube/config"
- "chown $(id -u):$(id -g) $HOME/.kube/config"
- ""
- "kubectl taint nodes --all node-role.kubernetes.io/master-"
- "kubectl get nodes"
- ""
- "################################################"
- "## Initialize helm                            ##"
- "################################################"
- ""
- "cd /tmp"
- "wget $HELM_URL/$HELM_FILE"
- "tar -zxvf $HELM_FILE"
- "mv linux-amd64/helm /usr/local/bin/helm"
- ""
- "################################################"
- "## Setup cluster for admin dashboard          ##"
- "################################################"
- ""
- "kubectl apply -f $DASHBOARD_URL"
- ""
- "cat <<EOF | kubectl apply -f -"
- "apiVersion: v1"
- "kind: ServiceAccount"
- "metadata:"
- "  name: admin-user"
- "  namespace: kubernetes-dashboard"
- "EOF"
- ""
- "cat <<EOF | kubectl apply -f -"
- "apiVersion: rbac.authorization.k8s.io/v1"
- "kind: ClusterRoleBinding"
- "metadata:"
- "  name: admin-user"
- "roleRef:                         "
- "  apiGroup: rbac.authorization.k8s.io"
- "  kind: ClusterRole"
- "  name: cluster-admin"
- "subjects:"
- "- kind: ServiceAccount"
- "  name: admin-user"
- "  namespace: kubernetes-dashboard"
- "EOF"
- ""
- "################################################"
- "## How to access and connect to dashboard     ##"
- "################################################"
- "#    Start proxy:"
- "#        kubectl proxy&"
- "#    Get UI token:"
- "#        kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard
  get secret | grep admin-user | awk '{print $1}')"
- ""
- "#    Port forward SSH session so you can access dashboard on a remote server:"
- "#        ssh -L 9999:127.0.0.1:8001 -N -f -l root kubernetes-master.siomporas.com"
- "        "
- "#    Access dashboard, using token from above, from web browser with local port
  9999 forwarded:"
- "#        http://localhost:9999/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
  \  "
- ""
- "################################################"
- "## Configure auto-provisioned NFS storage     ##"
- "################################################"
- ""
- "cat <<EOF | kubectl apply -f -"
- "apiVersion: storage.k8s.io/v1"
- "kind: StorageClass"
- "metadata:"
- "  name: managed-nfs-storage"
- "  annotations:"
- "    storageclass.kubernetes.io/is-default-class: 'true'"
- "provisioner: $NFS_PROVISION_NAME"
- "parameters:"
- "  archiveOnDelete: 'false'"
- "EOF"
- ""
- "cat <<EOF | kubectl apply -f -"
- "kind: Deployment"
- "apiVersion: apps/v1"
- "metadata:"
- "  name: nfs-client-provisioner"
- "spec:"
- "  selector:"
- "    matchLabels:"
- "      app: nfs-client-provisioner"
- "  replicas: 1"
- "  strategy:"
- "    type: Recreate"
- "  template:"
- "    metadata:"
- "      labels:"
- "        app: nfs-client-provisioner"
- "    spec:"
- "      serviceAccountName: nfs-client-provisioner"
- "      containers:"
- "        - name: nfs-client-provisioner"
- "          image: $NFS_CLIENT_PROVISIONER_CTNR"
- "          volumeMounts:"
- "            - name: nfs-client-root"
- "              mountPath: /persistentvolumes"
- "          env:"
- "            - name: PROVISIONER_NAME"
- "              value: $NFS_PROVISION_NAME"
- "            - name: NFS_SERVER"
- "              value: $NFS_SERVER"
- "            - name: NFS_PATH"
- "              value: $NFS_PATH"
- "      volumes:"
- "        - name: nfs-client-root"
- "          nfs:"
- "            server: $NFS_SERVER"
- "            path: $NFS_PATH"
- "EOF"
- ""
- "################################################"
- "## Configure Metal Load Balancer              ##"
- "################################################"
- ""
- "kubectl get configmap -n kube-system kube-proxy -o yaml > /tmp/proxy.yaml"
- "sed -i 's/strictARP: false/strictARP: true/g' /tmp/proxy.yaml"
- "kubectl replace -f /tmp/proxy.yaml"
- "kubectl apply -f $METALLB_NAMESPACE_URL"
- "kubectl apply -f $METALLB_URL"
- "kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey='$(openssl
  rand -base64 128)'"
- ""
- "cat <<EOF | kubectl apply -f -"
- "apiVersion: v1"
- "kind: ConfigMap"
- "metadata:"
- "  namespace: metallb-system"
- "  name: config"
- "data:"
- "  config: |"
- "    address-pools:"
- "    - name: default"
- "      protocol: layer2"
- "      addresses:"
- "      - $START_IP-$END_IP"
- "EOF"
- ""
- ""
- "################################################"
- "## Reset everything, clear docker cache       ##"
- "################################################"
- ""
- "# kubeadm reset -f && rm -rf /etc/cni/net.d && rm -f $HOME/.kube/config && docker
  system prune -a -f"
"ssh_authorized_keys":
- "ssh-rsa *key ommitted*
  rich@rich-xp-new"
</custom_script><host_name>k8s-node1.siomporas.com</host_name></initialization><memory>4294967296</memory><memory_policy><max>6442450944</max></memory_policy><name>k8s-node1.siomporas.com</name><template id="aceb058e-5689-49d3-a9d6-4caae908e34c"></template></vm>,7:headers;320:19:4:Host,9:127.0.0.1,]29:10:User-Agent,11:GoSDK/4.4.3,]26:14:Content-Length,5:10044,]28:6:Accept,15:application/xml,]114:13:Authorization,93:Bearer *bearer token ommitted*,]35:12:Content-Type,15:application/xml,]14:7:Version,1:4,]22:10:Connection,5:close,]]12:http_version;8:HTTP/1.1,}17:timestamp_created;18:1659295819.6264923^7:comment;0:;8:metadata;0:}6:marked;0:;9:is_replay;0:~11:intercepted;5:false!11:server_conn;3679:4:via2;0:~11:cipher_list;0:]11:cipher_name;27:ECDHE-RSA-AES256-GCM-SHA384;11:alpn_offers;0:]16:certificate_list;3070:1667:-----BEGIN CERTIFICATE-----
*certificate ommitted*
-----END CERTIFICATE-----
,1391:-----BEGIN CERTIFICATE-----
*certificate ommitted*
-----END CERTIFICATE-----
,]3:tls;4:true!5:error;0:~5:state;1:0#3:via;0:~11:tls_version;7:TLSv1.2;15:tls_established;4:true!19:timestamp_tls_setup;18:1659295819.6181064^19:timestamp_tcp_setup;18:1659295819.6103933^15:timestamp_start;18:1659295819.6073828^13:timestamp_end;18:1659295819.6805716^14:source_address;25:13:192.168.1.202;5:58980#]3:sni;24:vm-manager.siomporas.com;10:ip_address;23:13:192.168.1.203;3:443#]2:id;36:ba37bb6d-4761-4e41-a8e2-72e5a9442879;4:alpn;0:,7:address;34:24:vm-manager.siomporas.com;3:443#]}11:client_conn;478:11:cipher_list;0:]11:alpn_offers;0:]16:certificate_list;0:]3:tls;4:true!5:error;0:~8:sockname;18:9:127.0.0.1;3:443#]5:state;1:0#11:tls_version;7:TLSv1.3;14:tls_extensions;0:]15:tls_established;4:true!19:timestamp_tls_setup;18:1659295819.6245701^15:timestamp_start;18:1659295819.6053247^13:timestamp_end;18:1659295819.6814215^3:sni;0:~8:mitmcert;0:~2:id;36:d3ab2dc7-47dc-4d20-8c3c-0822e04e63a7;11:cipher_name;22:TLS_AES_256_GCM_SHA384;4:alpn;0:,7:address;20:9:127.0.0.1;5:47328#]}5:error;0:~2:id;36:7e5c566a-9136-4ab9-8aa9-d06302322ae8;}

If I take the tftpl file I use to build the runcmd property and run it through an XML entity encoder, the request works (unless of course I have a reserved token in one of the parameters that is injected into the template). Hopefully this is enough information for you to look at this with more detail. My guess is that this is not the only place you'd potentially run into encoding issues with this terraform provider or the upstream go oVirt client library.

ghost commented 2 years ago

@hotspoons one last question: you did not encode the contents by hand, right? Because it looks like the script is properly encoded.

Huge thanks for the help!

hotspoons commented 2 years ago

@hotspoons one last question: you did not encode the contents by hand, right? Because it looks like the script is properly encoded.

Huge thanks for the help!

No, it wasn't encoded, you can tell because there are normal shell characters like &, <, ' and > throughout the request body containing the initialization script in the failing request instead of XML entities like &amp;, &lt;, &apos; and &gt; . For comparison, two files. First is the raw UTF-8 text from the post above with keys and certs removed:

reqres.txt

Second file, when I XML encode the shell script that is a terraform template, I am able to deploy Kubernetes on initialization and everybody is happy:

reqres_encoded_shell_template.txt

I added you @janosdebugs to my private home lab repo so you can see what is necessary in the original terraform template that ultimately generates the second request: https://github.com/hotspoons/home-lab/blob/main/compute/k8s_master.tftpl

ghost commented 2 years ago

The weird part is there is some encoding there:

cat &lt;&lt;EOF &gt; /etc/sysctl.d/k8s.conf

However, other characters are not encoded. I'll look at go-ovirt and what's going on there and update this issue.

ghost commented 2 years ago

Nevermind, the encoding above comes from your initialization script!