oVirt / ovirt-openshift-extensions

Implementation of flexvolume driver and provisioner for oVirt
Apache License 2.0
31 stars 16 forks source link

problems on openshift 4.1 #143

Closed alclonky closed 2 years ago

alclonky commented 5 years ago

Description disk aren't attached on nodes and volumes not mounted

Versions:

Logs: Creating PVC: I0802 09:39:56.725081 1 leaderelection.go:156] attempting to acquire leader lease... I0802 09:39:56.740754 1 leaderelection.go:178] successfully acquired lease to provision for pvc ovirt-driver/4g-test I0802 09:39:56.744607 1 provision.go:75] About to provision a disk name: pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4 domain: FABAVMHOST_LUN_220 size: 4294967296 thin provisioned: true fi I0802 09:39:56.745136 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"ovirt-driver", Name:"4g-test", UID:"7ced7784-b509-11e9-b9c9-001a4a746fe4", APIV I0802 09:40:26.950312 1 leaderelection.go:204] stopped trying to renew lease to provision for pvc ovirt-driver/4g-test, timeout reached I0802 09:40:26.968859 1 leaderelection.go:156] attempting to acquire leader lease... I0802 09:40:26.997763 1 leaderelection.go:178] successfully acquired lease to provision for pvc ovirt-driver/4g-test I0802 09:40:27.001175 1 provision.go:75] About to provision a disk name: pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4 domain: FABAVMHOST_LUN_220 size: 4294967296 thin provisioned: true fi I0802 09:40:27.001398 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"ovirt-driver", Name:"4g-test", UID:"7ced7784-b509-11e9-b9c9-001a4a746fe4", APIV I0802 09:40:30.470714 1 controller.go:1079] volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test" created I0802 09:40:30.480379 1 controller.go:1096] volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test" saved I0802 09:40:30.480403 1 controller.go:1132] volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" provisioned for claim "ovirt-driver/4g-test" I0802 09:40:30.480480 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"ovirt-driver", Name:"4g-test", UID:"7ced7784-b509-11e9-b9c9-001a4a746fe4", APIV I0802 09:40:31.047772 1 leaderelection.go:198] stopped trying to renew lease to provision for pvc ovirt-driver/4g-test, task succeeded I0802 09:41:00.880668 1 controller.go:1079] volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test" created I0802 09:41:00.886627 1 controller.go:1100] failed to save volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test": persistentvolumes "pvc-7ced7784-b509-11e I0802 09:41:10.897568 1 controller.go:1100] failed to save volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test": persistentvolumes "pvc-7ced7784-b509-11e I0802 09:41:20.904321 1 controller.go:1100] failed to save volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test": persistentvolumes "pvc-7ced7784-b509-11e I0802 09:41:30.911010 1 controller.go:1100] failed to save volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test": persistentvolumes "pvc-7ced7784-b509-11e I0802 09:41:40.921865 1 controller.go:1100] failed to save volume "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" for claim "ovirt-driver/4g-test": persistentvolumes "pvc-7ced7784-b509-11e E0802 09:41:50.922150 1 controller.go:1110] Error creating provisioned PV object for claim ovirt-driver/4g-test: persistentvolumes "pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4" already e I0802 09:41:50.922208 1 provision.go:148] About to delete disk pvc-7ced7784-b509-11e9-b9c9-001a4a746fe4 id e071dfe8-9496-4851-bf5d-938fc40ce94c I0802 09:41:50.922677 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"ovirt-driver", Name:"4g-test", UID:"7ced7784-b509-11e9-b9c9-001a4a746fe4", APIV-7ced7784-b509-11e9-b9c9-001a4a746fe4" already exists. Deleting the volume. Creating Pod: Unable to mount volumes for pod "testpodwithflex_ovirt-driver(0031cbd8-b50a-11e9-b9c9-001a4a746fe4)": timeout expired waiting for volumes to attach or mount for pod "ovirt-driver"/"testpodwithflex". list of unmounted volumes=[pv0002]. list of unattached volumes=[pv0002 default-token-hjf2n] ovirt: disk is created but not attached to node

rgolangh commented 5 years ago

Can you share the kube-control-manager log and ovirt-engine.log from that time?

cobexer commented 5 years ago

Newer logs than above, we don't have direct access to the ovirt-engine.log - I requested them but haven't got them yet.

kube-control-manager:

I0805 07:54:05.159389       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195744", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:07.169787       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195758", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:09.178868       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195766", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:10.163252       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195766", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:11.188016       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195782", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:13.201101       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195793", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:15.205532       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195805", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:17.215139       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195816", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:19.225384       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195825", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:21.234273       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195841", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:23.164388       1 resource_quota_controller.go:440] syncing resource quota controller with updated resources from discovery: added: [/v1, Resource=configmaps /v1, Resource=endpoints /v1, Resource=events /v1, Resource=limitranges /v1, Resource=persistentvolumeclaims /v1, Resource=pods /v1, Resource=podtemplates /v1, Resource=replicationcontrollers /v1, Resource=resourcequotas /v1, Resource=secrets /v1, Resource=serviceaccounts /v1, Resource=services apps.openshift.io/v1, Resource=deploymentconfigs apps/v1, Resource=controllerrevisions apps/v1, Resource=daemonsets apps/v1, Resource=deployments apps/v1, Resource=replicasets apps/v1, Resource=statefulsets authorization.openshift.io/v1, Resource=rolebindingrestrictions autoscaling.openshift.io/v1beta1, Resource=machineautoscalers autoscaling/v1, Resource=horizontalpodautoscalers batch/v1, Resource=jobs batch/v1beta1, Resource=cronjobs build.openshift.io/v1, Resource=buildconfigs build.openshift.io/v1, Resource=builds cloudcredential.openshift.io/v1, Resource=credentialsrequests coordination.k8s.io/v1beta1, Resource=leases events.k8s.io/v1beta1, Resource=events extensions/v1beta1, Resource=daemonsets extensions/v1beta1, Resource=deployments extensions/v1beta1, Resource=ingresses extensions/v1beta1, Resource=networkpolicies extensions/v1beta1, Resource=replicasets healthchecking.openshift.io/v1alpha1, Resource=machinehealthchecks image.openshift.io/v1, Resource=imagestreams k8s.cni.cncf.io/v1, Resource=network-attachment-definitions machine.openshift.io/v1beta1, Resource=machines machine.openshift.io/v1beta1, Resource=machinesets machineconfiguration.openshift.io/v1, Resource=mcoconfigs monitoring.coreos.com/v1, Resource=alertmanagers monitoring.coreos.com/v1, Resource=prometheuses monitoring.coreos.com/v1, Resource=prometheusrules monitoring.coreos.com/v1, Resource=servicemonitors network.openshift.io/v1, Resource=egressnetworkpolicies networking.k8s.io/v1, Resource=networkpolicies operator.openshift.io/v1, Resource=ingresscontrollers operators.coreos.com/v1, Resource=catalogsourceconfigs operators.coreos.com/v1, Resource=operatorgroups operators.coreos.com/v1, Resource=operatorsources operators.coreos.com/v1alpha1, Resource=catalogsources operators.coreos.com/v1alpha1, Resource=clusterserviceversions operators.coreos.com/v1alpha1, Resource=installplans operators.coreos.com/v1alpha1, Resource=subscriptions policy/v1beta1, Resource=poddisruptionbudgets rbac.authorization.k8s.io/v1, Resource=rolebindings rbac.authorization.k8s.io/v1, Resource=roles route.openshift.io/v1, Resource=routes template.openshift.io/v1, Resource=templateinstances template.openshift.io/v1, Resource=templates tuned.openshift.io/v1, Resource=tuneds], removed: []
E0805 07:54:23.164829       1 resource_quota_controller.go:445] failed to sync resource monitors: [couldn't start monitor for resource "machine.openshift.io/v1beta1, Resource=machinesets": unable to monitor quota for resource "machine.openshift.io/v1beta1, Resource=machinesets", couldn't start monitor for resource "monitoring.coreos.com/v1, Resource=servicemonitors": unable to monitor quota for resource "monitoring.coreos.com/v1, Resource=servicemonitors", couldn't start monitor for resource "monitoring.coreos.com/v1, Resource=prometheusrules": unable to monitor quota for resource "monitoring.coreos.com/v1, Resource=prometheusrules", couldn't start monitor for resource "operators.coreos.com/v1alpha1, Resource=installplans": unable to monitor quota for resource "operators.coreos.com/v1alpha1, Resource=installplans", couldn't start monitor for resource "cloudcredential.openshift.io/v1, Resource=credentialsrequests": unable to monitor quota for resource "cloudcredential.openshift.io/v1, Resource=credentialsrequests", couldn't start monitor for resource "operators.coreos.com/v1alpha1, Resource=subscriptions": unable to monitor quota for resource "operators.coreos.com/v1alpha1, Resource=subscriptions", couldn't start monitor for resource "operators.coreos.com/v1alpha1, Resource=catalogsources": unable to monitor quota for resource "operators.coreos.com/v1alpha1, Resource=catalogsources", couldn't start monitor for resource "autoscaling.openshift.io/v1beta1, Resource=machineautoscalers": unable to monitor quota for resource "autoscaling.openshift.io/v1beta1, Resource=machineautoscalers", couldn't start monitor for resource "monitoring.coreos.com/v1, Resource=alertmanagers": unable to monitor quota for resource "monitoring.coreos.com/v1, Resource=alertmanagers", couldn't start monitor for resource "k8s.cni.cncf.io/v1, Resource=network-attachment-definitions": unable to monitor quota for resource "k8s.cni.cncf.io/v1, Resource=network-attachment-definitions", couldn't start monitor for resource "tuned.openshift.io/v1, Resource=tuneds": unable to monitor quota for resource "tuned.openshift.io/v1, Resource=tuneds", couldn't start monitor for resource "monitoring.coreos.com/v1, Resource=prometheuses": unable to monitor quota for resource "monitoring.coreos.com/v1, Resource=prometheuses", couldn't start monitor for resource "operator.openshift.io/v1, Resource=ingresscontrollers": unable to monitor quota for resource "operator.openshift.io/v1, Resource=ingresscontrollers", couldn't start monitor for resource "operators.coreos.com/v1, Resource=catalogsourceconfigs": unable to monitor quota for resource "operators.coreos.com/v1, Resource=catalogsourceconfigs", couldn't start monitor for resource "healthchecking.openshift.io/v1alpha1, Resource=machinehealthchecks": unable to monitor quota for resource "healthchecking.openshift.io/v1alpha1, Resource=machinehealthchecks", couldn't start monitor for resource "operators.coreos.com/v1, Resource=operatorsources": unable to monitor quota for resource "operators.coreos.com/v1, Resource=operatorsources", couldn't start monitor for resource "machine.openshift.io/v1beta1, Resource=machines": unable to monitor quota for resource "machine.openshift.io/v1beta1, Resource=machines", couldn't start monitor for resource "machineconfiguration.openshift.io/v1, Resource=mcoconfigs": unable to monitor quota for resource "machineconfiguration.openshift.io/v1, Resource=mcoconfigs", couldn't start monitor for resource "operators.coreos.com/v1, Resource=operatorgroups": unable to monitor quota for resource "operators.coreos.com/v1, Resource=operatorgroups", couldn't start monitor for resource "operators.coreos.com/v1alpha1, Resource=clusterserviceversions": unable to monitor quota for resource "operators.coreos.com/v1alpha1, Resource=clusterserviceversions"]
I0805 07:54:23.244227       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195851", FieldPath:""}): type: 'Normal' reason: 'ExternalProvisioning' waiting for a volume to be created, either by external provisioner "ovirt-volume-provisioner" or manually created by system administrator
I0805 07:54:23.860896       1 pv_controller.go:824] volume "pvc-32584bd8-b756-11e9-834b-001a4a746fe4" entered phase "Bound"
I0805 07:54:23.860931       1 pv_controller.go:963] volume "pvc-32584bd8-b756-11e9-834b-001a4a746fe4" bound to claim "default/1g-ovirt-cow-disk"
I0805 07:54:23.871954       1 pv_controller.go:768] claim "default/1g-ovirt-cow-disk" entered phase "Bound"
W0805 07:55:52.826786       1 plugins.go:842] FindExpandablePluginBySpec(pvc-32584bd8-b756-11e9-834b-001a4a746fe4) -> returning noopExpandableVolumePluginInstance

ovirt-volume-provisioner:

I0805 07:54:05.160567       1 leaderelection.go:178] successfully acquired lease to provision for pvc default/1g-ovirt-cow-disk
I0805 07:54:05.163601       1 provision.go:75] About to provision a disk name: pvc-32584bd8-b756-11e9-834b-001a4a746fe4 domain: LUN_220 size: 1073741824 thin provisioned: true file system: ext4
I0805 07:54:05.163973       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195739", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/1g-ovirt-cow-disk"
I0805 07:54:23.846197       1 controller.go:1079] volume "pvc-32584bd8-b756-11e9-834b-001a4a746fe4" for claim "default/1g-ovirt-cow-disk" created
I0805 07:54:23.856760       1 controller.go:1096] volume "pvc-32584bd8-b756-11e9-834b-001a4a746fe4" for claim "default/1g-ovirt-cow-disk" saved
I0805 07:54:23.856782       1 controller.go:1132] volume "pvc-32584bd8-b756-11e9-834b-001a4a746fe4" provisioned for claim "default/1g-ovirt-cow-disk"
I0805 07:54:23.856980       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"1g-ovirt-cow-disk", UID:"32584bd8-b756-11e9-834b-001a4a746fe4", APIVersion:"v1", ResourceVersion:"2195739", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-32584bd8-b756-11e9-834b-001a4a746fe4
I0805 07:54:25.254364       1 leaderelection.go:198] stopped trying to renew lease to provision for pvc default/1g-ovirt-cow-disk, task succeeded

I will attach the ovirt-engine.log as soon as I get my hands on it of course =)

cobexer commented 5 years ago

Here is the ovirt-engine.log from the time of the disk creation: engine-filtered.log

alclonky commented 5 years ago

for us it looks like the disk mount isn't called, we cannot see anything related in the ovirt logs.

cobexer commented 5 years ago

We tried to apply the changes suggested in https://github.com/oVirt/ovirt-openshift-extensions/issues/127 but could not yet successfully modify kube-controller-manager-pod to apply the changes because OpenShift 4.1 rolls back the changes and since the cluster has no persistent storage we don't have the logging stack deployed yet.

I also found this during startup in the kube-control-manager pod(which I can't modify it seems):

W0808 07:17:24.293406       1 probe.go:271] Flexvolume plugin directory at /etc/kubernetes/kubelet-plugins/volume/exec does not exist. Recreating.
sandrobonazzola commented 2 years ago

This project is no longer maintained, closing.