Closed bsperduto closed 4 years ago
Unfortunately the OKD installer does not currently appear to have oVirt as an available platform
Its not present in survey, but it does accept platform: ovirt
in install-config.yaml
There are a few fixes required though:
Went ahead and took a stab at creating a install-config.yaml. I'm getting a terraform error like below. I'm not convinced my config file is correct though, do you have a working example you can share?
Thanks
INFO Creating infrastructure resources...
ERROR
ERROR Error: Tag not matched: expectbut got ERROR
ERROR on ../../tmp/openshift-install-760331462/template/main.tf line 11, in data "ovirt_templates" "osImage": ERROR 11: data "ovirt_templates" "osImage" {
ERROR
ERROR
ERROR
ERROR Error: Tag not matched: expectbut got ERROR
ERROR on ../../tmp/openshift-install-760331462/template/main.tf line 18, in data "ovirt_clusters" "clusters": ERROR 18: data "ovirt_clusters" "clusters" {
ERROR
ERROR
ERROR Failed to read tfstate: open /tmp/openshift-install-760331462/terraform.tfstate: no such file or directory FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform
Check your ovirt-config.yaml
. ovirt_url
has to be an API endpoint, in my case for WebUI at https://foo.example.com:8443/ovirt-engine ovirt_url
was https://foo.example.com:8443/ovirt-engine/api
* terraform-provider-ovirt's fork needs [this commit](https://github.com/vrutkovs/terraform-provider-ovirt/commit/029b02f9d39775a68511dc0f6ac5d226a1d6f826) - I'll make sure installer is updated too
@vrutkovs let's not use a fork for the ovirt provider, mind opening a PR for that commit upstream?
Check your
ovirt-config.yaml
.ovirt_url
has to be an API endpoint, in my case for WebUI at https://foo.example.com:8443/ovirt-engineovirt_url
was https://foo.example.com:8443/ovirt-engine/api
Great, I was able to progress quite a bit. It successfully created the bootstrap and masters and was able to boot them. It appeared that keepalived never came up on the bootstrap node though or it never began broadcasting on the ip as expected. The master nodes kept searching for the MCO at that ip but couldnt reach it. Do you have a command to get a log from keepalived?
Use oc adm must-gather
to collect the necessary info
When I attempt to do oc adm must-gather I'm getting a no route to host error as it's trying to connect to the keepalived instance that isn't running.
Below is the status output of the kubelet running on the bootstrap node, this seems to be the most relevant information
Feb 04 01:24:34 localhost hyperkube[4210]: I0204 01:24:34.478718 4210 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach Feb 04 01:24:34 localhost hyperkube[4210]: I0204 01:24:34.481150 4210 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node localhost Feb 04 01:24:34 localhost hyperkube[4210]: I0204 01:24:34.481189 4210 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node localhost Feb 04 01:24:34 localhost hyperkube[4210]: I0204 01:24:34.481199 4210 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node localhost Feb 04 01:24:34 localhost hyperkube[4210]: E0204 01:24:34.621141 4210 remote_runtime.go:200] CreateContainer in sandbox "b77d96611136e1c23d57d31d459f50d845eb54cfd3c8bd0da0b32e263fc193b7" from runtime service failed: rpc error: code = Unknown desc = container create failed: time="2020-02-04T01:24:34Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\"runtimecfg\\": executable file not found in $PATH\"" Feb 04 01:24:34 localhost hyperkube[4210]: container_linux.go:346: starting container process caused "exec: \"runtimecfg\": executable file not found in $PATH" Feb 04 01:24:34 localhost hyperkube[4210]: E0204 01:24:34.621261 4210 kuberuntime_manager.go:803] init container start failed: CreateContainerError: container create failed: time="2020-02-04T01:24:34Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\"runtimecfg\\": executable file not found in $PATH\"" Feb 04 01:24:34 localhost hyperkube[4210]: container_linux.go:346: starting container process caused "exec: \"runtimecfg\": executable file not found in $PATH" Feb 04 01:24:34 localhost hyperkube[4210]: E0204 01:24:34.621309 4210 pod_workers.go:191] Error syncing pod b9aca84bcb23f61afa3a448a5f4225f0 ("coredns-localhost_openshift-ovirt-infra(b9aca84bcb23f61afa3a448a5f4225f0)"), skipping: failed to "StartContainer" for "render-config" with CreateContainerError: "container create failed: time=\"2020-02-04T01:24:34Z\" level=error msg=\"container_linux.go:346: starting container process caused \\"exec: \\\\"runtimecfg\\\\": executable file not found in $PATH\\"\"\ncontainer_linux.go:346: starting container process caused \"exec: \\"runtimecfg\\": executable file not found in $PATH\"\n"
Interesting, at some point we replaced baremetal-runtimcfg
with a dummy image - but now it should be mirrored from OCP. Which OKD release is that? Could you give it a try on the latest 4.4 from https://origin-release.svc.ci.openshift.org/?
This was using the alpha2 release on github. I'll try a CI build tonight.
Was able to get through the initial pivot on the bootstrap and the master nodes tonight but it's getting stuck on etcd. This appears to be the same as #52. I do see etcd-signer running on the bootstrap node but it's apparently not signing the certs.
2020-02-05T02:30:42.726944855+00:00 stderr F + kube-client-agent request --kubeconfig=/etc/kubernetes/kubeconfig --orgname=system:etcd-servers --assetsdir=/etc/ssl/etcd --dnsnames=localhost,etcd.kube-system.svc,etcd.kube-system.svc.cluster.local,etcd.openshift-etcd.svc,etcd.openshift-etcd.svc.cluster.local --commonname=system:etcd-server:10.10.10.151 --ipaddrs=10.10.10.151,127.0.0.1 2020-02-05T02:30:53.219874108+00:00 stderr F ERROR: logging before flag.Parse: E0205 02:30:53.219549 7 agent.go:145] unable to retrieve approved CSR: the server could not find the requested resource (get certificatesigningrequests.certificates.k8s.io system:etcd-server:10.10.10.151). Retrying. 2020-02-05T02:30:56.221787614+00:00 stderr F ERROR: logging before flag.Parse: E0205 02:30:56.221719 7 agent.go:145] unable to retrieve approved CSR: the server could not find the requested resource (get certificatesigningrequests.certificates.k8s.io system:etcd-server:10.10.10.151). Retrying. 2020-02-05T02:30:59.222030319+00:00 stderr F ERROR: logging before flag.Parse: E0205 02:30:59.221949 7 agent.go:145] unable to retrieve approved CSR: the server could not find the requested resource (get certificatesigningrequests.certificates.k8s.io system:etcd-server:10.10.10.151). Retrying. 2020-02-05T02:31:02.221230756+00:00 stderr F ERROR: logging before flag.Parse: E0205 02:31:02.221141 7 agent.go:145] unable to retrieve approved CSR: the server could not find the requested resource (get certificatesigningrequests.certificates.k8s.io system:etcd-server:10.10.10.151). Retrying. 2020-02-05T02:31:03.221419613+00:00 stderr F ERROR: logging before flag.Parse: E0205 02:31:03.221336 7 agent.go:145] unable to retrieve approved CSR: the server could not find the requested resource (get certificatesigningrequests.certificates.k8s.io system:etcd-server:10.10.10.151). Retrying. 2020-02-05T02:31:03.221419613+00:00 stderr F Error: error requesting certificate: error obtaining signed certificate from signer: timed out waiting for the condition
@bsperduto FWIW I had a successful ovirt IPI build from the CI build '4.4.0-0.okd-2020-02-05-224417' (https://origin-release.svc.ci.openshift.org/releasestream/4.4.0-0.okd/release/4.4.0-0.okd-2020-02-05-224417).
I had some issues with being able to set the disk/memory/cpu, and in the end I deployed from a modified template (from a previous failed install) that had more disk, cpu and ram allocated by default otherwise the masters ran out of disk space for the various components to run. It wasn't immediately obvious to me how to do the override, but I eventually found a note in the dev readme's saying to do: export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE="ocp-rhcos"
So far as DNS I only configured 3 records as DNS is handled magically inside the cluster:
Network wise, remember to turn the filter off on the ovirt network otherwise the VIP spoofing from keepalived won't work.
I also was finding that because I have a fairly slow internet connection, my installer was timing out, but I was able to just let things tick along and run openshift-install wait-for install-complete
and that did the trick. If I had a bit more patience I would have mirrored the repo's instead.
The install-config.yaml I used was:
apiVersion: v1
baseDomain: example.com
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform: {}
replicas: 2
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform: {}
replicas: 3
metadata:
creationTimestamp: null
name: test1
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
ovirt:
api_vip: 10.50.106.20
dns_vip: 10.50.106.21
ingress_vip: 10.50.106.22
ovirt_cluster_id: 33fe2fa8-eb60-11e9-bb60-00163e186f47
ovirt_network_name: lab.example.com
ovirt_storage_domain_id: e4323f39-f50b-4462-8df1-fff0dd587a9f
publish: External
pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
sshKey: ssh-rsa blahblahblahsomefakekeyhere my@machine
I also ended up with an ~/.ovirt/ovirt-config.yaml containing:
---
## The hostname or IP address of the ovirt engine
ovirt_url: https://ovirt.example.com/ovirt-engine/api
### The name of the user for accessing the ovirt engine
ovirt_username: admin@internal
## The password associated with the user
ovirt_password: some-awesome-password
ovirt_insecure: true
Again, sharing because I found it hard to find any good examples for ovirt and had to piece it together from the source.
Hope this helps!
Edit; tidied up and added ovirt network setting.
Since https://origin-release.svc.ci.openshift.org/releasestream/4.4.0-0.okd/release/4.4.0-0.okd-2020-03-13-191636 oVirt IPI should work (previously workers didn't join the cluster).
Anyone has a cluster to verify that?
ah awesome, I've been running builds over the last few days with whatever the current build was at the time and seen mixed results. I'll kick a build off in a few mins and let you know.
Just an observation (not sure if this is intentional for oVirt)
the fedora coreos build is still attempting to pull, though it looks like the installer expects Fedora CoreOS 31.20200310.20
:
INFO Obtaining RHCOS image file from 'https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/31.20200113.3.1/x86_64/fedora-coreos-31.20200113.3.1-openstack.x86_64.qcow2.xz?sha256=8fd0f6da46285427565749754e74e4648c8516090e03185f526464ca07bf7f63'
Is there currently a way to specify CPU, Ram and disk for masters/compute with the ovirt provider? It seemed that the settings I thought were there weren't being passed through. Currently I am manually changing the ram available to avoid an OOM with the default memory setting of 8GB on the masters, and have had to forge a template from the one that is created in order to have additional disk space (8GB default) on each machine.
it looks like the installer expects Fedora CoreOS 31.20200310.20
Installer starts with a specific stable FCOS build (31.20200113.3.1) and then updates all machines to latest ostree commit in machine-os-content
of the release. This is why release page has Fedora CoreOS 31.20200310.20
, but installer starts from an older version
Is there currently a way to specify CPU, Ram and disk for masters/compute with the ovirt provider?
@rgolangh is this already supported?
Ah thanks okay that makes sense.
So the install hasn't reached 'install finished' yet, it looks like there is an issue with spinning up the worker nodes, they are defined as machines from the machineset, but they aren't being spawned on the compute side (oVirt 4.3.8).
The install itself looks to be 'ok' the issue is that the ovirt machine credentials are being created in the kube-system namespace, not in the openshift-machine-api namespace:
I0314 00:06:18.953364 1 actuator.go:333] failed getting credentials for namespace openshift-machine-api, error getting credentials secret "ovirt-credentials" in namespace "openshift-machine-api": Secret "ovirt-credentials" not found E0314 00:06:18.953508 1 controller.go:279] Failed to check if machine "uk1-96v4d-worker-0-46rvd" exists: failed to create connection to oVirt API {"level":"error","ts":1584144378.9535766,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/uk1-96v4d-worker-0-46rvd","error":"failed to create connection to oVirt API","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
../oc --kubeconfig auth/kubeconfig get secret --all-namespaces | grep ovirt kube-system ovirt-credentials Opaque 5 172m openshift-ovirt-infra builder-dockercfg-p6zwm kubernetes.io/dockercfg 1 120m openshift-ovirt-infra builder-token-dhclm kubernetes.io/service-account-token 4 121m openshift-ovirt-infra builder-token-ljtxs kubernetes.io/service-account-token 4 120m openshift-ovirt-infra default-dockercfg-mpbzp kubernetes.io/dockercfg 1 120m openshift-ovirt-infra default-token-2xvp4 kubernetes.io/service-account-token 4 121m openshift-ovirt-infra default-token-2zqq9 kubernetes.io/service-account-token 4 171m openshift-ovirt-infra deployer-dockercfg-4mjg7 kubernetes.io/dockercfg 1 120m openshift-ovirt-infra deployer-token-qxjwf kubernetes.io/service-account-token 4 121m openshift-ovirt-infra deployer-token-zx9jg kubernetes.io/service-account-token 4 120m
As soon as I recreated the secret (in the correct namespace - openshift-machine-api - this time) the cluster is progressing again. Going to leave this running over night and see where it gets to, but I expect it'll be successful.
Install version: ../openshift-install 4.4.0-0.okd-2020-03-13-191636 built from commit f0d3afed3c4655a6514fdfc54bc40348f0aac80b release image registry.svc.ci.openshift.org/origin/release@sha256:8d33b9e48493042f6867bde243b9f0475ff7ba7ca14aca70670df14d62c13819
The config file I am using is essentially the one I posted earlier but with real hostnames/ips.
Hope this helps
The install itself looks to be 'ok' the issue is that the ovirt machine credentials are being created in the kube-system namespace, not in the openshift-machine-api namespace:
Oh, interesting. We're using machine-config-operator and installer forks for OKD, so perhaps these are out of sync now
Just to confirm that after updating the namespace the secret existed in, the cluster reached install-complete
I had a bit of time today to mess around and these are the only things I found to be of interest:
../oc --kubeconfig auth/kubeconfig get csr -o name | xargs ../oc --kubeconfig auth/kubeconfig adm certificate approve
to solve that, but otherwise scaling has been fine, too.Overall this is seeming to be a very workable build, the credential issue is really the only blocker for a 'good' install, the other items are things that you can work around if needed.
I had a bit of time today to mess around and these are the only things I found to be of interest:
- the cloud-credential-operator appears to be unable to access the ovirt credential, my tactical fix probably worked around it by putting the credential in the 1 place it needs to be for machinesets to grow/shrink, but the proper way seems to be to provide the credential into the namespace for that operator to then dish out into the right places
- There is no native storage for ovirt. This is quite painful for me, looks like . EmberCSI is available from the operator hub, which gives some nice options. I run FreeNAS as my backend and the chap who wrote the older flexvolume implementation for it has also written it for CSI.. https://github.com/democratic-csi this is amazing, deployed it earlier and it just works.
- cluster-samples still only seems to have the enterprise samples available, but I am sure I saw a ticket to fix that already
- Its a bit annoying that we can't yet configure cpu/memory/disk for machinesets within the IPI. I note that I had to bump the resources available to the masters from the default to 8core/16GB ram/100GB disk and built all machines from a template with the same resources.
- I had an issue with my ovirt hosted-engine (logs device filled up) after a new build and everything was good, but just before I scaled up the worker pool. After I restored the hosted-engine to life (cleared the log dir and restarted) I had some workers in a state where the CSR's were not being approved (I expected that they would just get approved and everything would be fine), so I ended up running
../oc --kubeconfig auth/kubeconfig get csr -o name | xargs ../oc --kubeconfig auth/kubeconfig adm certificate approve
to solve that, but otherwise scaling has been fine, too.Overall this is seeming to be a very workable build, the credential issue is really the only blocker for a 'good' install, the other items are things that you can work around if needed.
What stage of the install did you copy the secret over during? I attempted earlier today but was unsuccessful and may need to do it earlier in the process.
I was probably 1.5hours in to the install at this point (my connection is slow), it was after the point that the cluster created worker machines, but didn't spawn them. If you take a look at the logs for openshift-machine-api operator, it'll give you a clue if it is locating the credentials or not (https://github.com/openshift/okd/issues/61#issuecomment-598980286).
Just make sure your secret is named ovirt-credentials
- I literally just ran:
./oc --kubeconfig auth/kubeconfig get secret ovirt-credentials -n kube-system -o yaml > ovirt-secret.yaml
then modified the yaml to update the namespace
./oc --kubeconfig auth/kubeconfig create -f ovirt-secret.yaml
If you do an oc get pods --all-namespaces
the only pods that are pending should be the ones that are trying to start on worker nodes such as the router pods.
I had a bit of time today to mess around and these are the only things I found to be of interest:
* the cloud-credential-operator appears to be unable to access the ovirt credential, my tactical fix probably worked around it by putting the credential in the 1 place it needs to be for machinesets to grow/shrink, but the proper way seems to be to provide the credential into the namespace for that operator to then dish out into the right places
cloud-credentials-operator works with CredentialsRequest object. ovirt machine controller creates one and the credentials controller create the secret under its namespace. If you have logs from both components that would be good.
* There is no native storage for ovirt. This is quite painful for me, looks like . EmberCSI is available from the operator hub, which gives some nice options. I run FreeNAS as my backend and the chap who wrote the older flexvolume implementation for it has also written it for CSI.. https://github.com/democratic-csi this is amazing, deployed it earlier and it just works.
CSI driver, with an opertaor to deploy it are almost done https://github.com/openshift/ovirt-csi-driver
You can pick up the CSI driver and deploy it manually manually - look under deploy folder. I didn't had the chance to straiten the README's yet.
* cluster-samples still only seems to have the enterprise samples available, but I am sure I saw a ticket to fix that already * Its a bit annoying that we can't yet configure cpu/memory/disk for machinesets within the IPI. I note that I had to bump the resources available to the masters from the default to 8core/16GB ram/100GB disk and built all machines from a template with the same resources. * I had an issue with my ovirt hosted-engine (logs device filled up) after a new build and everything was good, but just before I scaled up the worker pool. After I restored the hosted-engine to life (cleared the log dir and restarted) I had some workers in a state where the CSR's were not being approved (I expected that they would just get approved and everything would be fine), so I ended up running `../oc --kubeconfig auth/kubeconfig get csr -o name | xargs ../oc --kubeconfig auth/kubeconfig adm certificate approve` to solve that, but otherwise scaling has been fine, too.
Probably a matter of timing. I think there is a window of 10 minutes to approve new hosts. Worst case, delete the machine and it will be recreated. (next time :))
Overall this is seeming to be a very workable build, the credential issue is really the only blocker for a 'good' install, the other items are things that you can work around if needed.
Happy this is working. An issue worth noting for hosted engine installs (and perhaps you have something to add) - https://bugzilla.redhat.com/show_bug.cgi?id=1813725
I think its worth filing separate issues for each encountered problem - especially the ovirt-credentials
one.
I was probably 1.5hours in to the install at this point (my connection is slow)
At this point we don't push images to Quay, but this can be worked around:
Mirror the images to a local registry - https://docs.okd.io/latest/installing/install_config/installing-restricted-networks-preparations.html, that's about 4GB now
Block registry.svc.openshift.org
in the DNS so that the mirror would be used first
cluster-samples still only seems to have the enterprise samples available, but I am sure I saw a ticket to fix that already
https://github.com/openshift/okd/issues/34
Its a bit annoying that we can't yet configure cpu/memory/disk for machinesets within the IPI
Make a new customized template and set OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
env var to use it
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1813741 to fix the machine configuration
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1813741 to fix the machine configuration
Is it worth also including network configuration here too? I know from experience I had customers who wanted to place say router nodes in the DMZ zone, and nodes inside their app network for the actual workload, while the control plane lived in another zone.
Okay so, I've had a bit of time to progress a bit: 1) havent been able to work out why the cloud credentials operator isnt dishing out credentials. I am not too sure why, I can't see any reason why it wouldn't be able to work but it was only an short look. Perhaps if the credential was initially created in the kube-system namespace there is a permission missing to allow the controller to access the openshift-machine-api namespace where it expects that secret? I don't know, just guessing.
---
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
creationTimestamp: "2020-03-13T21:18:56Z"
finalizers:
- cloudcredential.openshift.io/deprovision
generation: 1
labels:
controller-tools.k8s.io: "1.0"
name: openshift-machine-api-ovirt
namespace: openshift-cloud-credential-operator
resourceVersion: "1927"
selfLink: /apis/cloudcredential.openshift.io/v1/namespaces/openshift-cloud-credential-operator/credentialsrequests/openshift-machine-api-ovirt
uid: 6c3fe339-d6d0-49fe-bfc6-d883712a7476
spec:
providerSpec:
apiVersion: cloudcredential.openshift.io/v1
kind: OvirtProviderSpec
secretRef:
name: ovirt-credentials
namespace: openshift-machine-api
status:
conditions:
- lastProbeTime: "2020-03-13T21:20:42Z"
lastTransitionTime: "2020-03-13T21:20:42Z"
message: cloud creds are insufficient to satisfy CredentialsRequest
reason: CloudCredsInsufficient
status: "True"
type: InsufficientCloudCreds
lastSyncGeneration: 0
provisioned: false
2) I overrode the image stream locations for PHP + Mysql based on the https://github.com/openshift/library/blob/master/community/mysql/imagestreams/mysql-centos7.json image streams. 3) I also deployed the ovirt-csi-driver (this is an amazing discovery, thank you!) I had an issue with the ovirt-credentials secret not being provisioned by the credentials operator (the request was correct), but once I put the secret in place it worked a treat. 4) Once storage was in place I was able to deploy the registry. Worth noting here that because there is no storage out of the box, you have to update the registry operator management state from removed to unmanaged. I followed the steps here: https://docs.openshift.com/container-platform/4.3/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html 5) Once the registry was deployed I could finally run a successful build, complete with persistent storage and have my test workload successfully running.
@rgolangh wrt nodes coming up and missing the csr signing, I agree it is probably just a timing issue. The issue certainly hasn't appeared since and I have done a couple of grow/shrinks of the worker nodes.
@vrutkovs Do you want me to raise the credentials one somewhere? TBH I was planning on taking the hit with the image sizes for a management cluster, and then running a registry (like go harbor since Quay core doesnt seem to be available)
Totally unrelated to OKD and off to the side - I only mention because I said it worked fine earlier; the FreeNAS CSI driver, everything worked up until I tried to consume the PVC with a pod. At that point the event stream was complaining the csidriver wasn't registered (it was, but the socket didnt exist in the right place)
Cheers all
@vrutkovs Do you want me to raise the credentials one somewhere? TBH I was planning on taking the hit with the image sizes for a management cluster, and then running a registry (like go harbor since Quay core doesnt seem to be available)
Lets file a new OKD issue on that. It seems to be the only blocking bug, but I'd like to get another confirmation (could it be an oVirt misconfiguration)?
It's possible that it could be, but I am just using the default internal admin account which has no restrictions. I'm happy to jump on a bluejeans or something if you want to validate.
So I just ran through a new install against 4.4.0-0.okd-2020-03-16-194045.
oc --kubeconfig auth/kubeconfig create -f ovirt-csi-driver/deploy/csi-driver/
. Did also need to put the ovirt-credentials secret in this namespace while the cloud credential operator is being naughtykube-apiserver-operator
No recognized cloud provider platform found in infrastructures.config.openshift.io/cluster.status.platform
I also built with a proxy - this is also working
:tada:
I am catching some traffic heading off to registry.redhat.io, presumably this is the controller trying to pull imagestreams
Yes, this is (misconfigured) samples operator - see #34
Worth noting traffic to api.openshift.com (presumably telemetry and insights)
This can be disabled by setting "fake" pullsecret. Filed https://github.com/openshift/okd/issues/107 to have this mentioned in official docs.
40mins down from 2hours.
A bit odd, maybe @rgolangh has thoughs on why post-bootstrap barely makes it in 40 mins? In any case lets file a new bug to track this
kube-apiserver-operator reports that ovirt is an unrecognised platform
That's pretty much expected since kubelet doesn't have an ovirt cloud controller. These are legacy anyway
It would be a nicer behavior if the registry started with emptyDir: {} storage
We can't make that decision for you, as emptyDir registry storage cannot be migrated to a different storage provider afterwards.
Its odd that ovirt-csi-driver could not create a PVC for registry - I guess it doesn't support RWX volumes?
So I'm going to close this as oVirt (more or less) works, the blocker bug has a workaround and its worth having separate tickets for each problem.
Thanks for testing this!
Just a quick update I did attempt to test the oVirt IPI of OKD using the directions at https://github.com/openshift/installer/blob/master/docs/user/ovirt/install_ipi.md. Unfortunately the OKD installer does not currently appear to have oVirt as an available platform. Up to that point those directions do work in my testing. I have not attempted a UPI type install yet.