openshift / cluster-operator

52 stars 35 forks source link

Can't create cluster with AMI and installer for 3.0.34 #336

Open jhernand opened 6 years ago

jhernand commented 6 years ago

We are trying to deploy a cluster using version 3.0.34 of openshift-ansible and a corresponding AMI, but we get the following error from the AWS machine controller:

ERROR: logging before flag.Parse: W0910 13:14:44.469526       1 controller.go:136] Unable to create machine jhernand11-5fjvv-master-l69wb: cannot create EC2 instance: InvalidBlockDeviceMapping: Volume of size 100GB is smaller than  snapshot 'snap-06909941fa52515d0', expect size >= 200GB
    status code: 400, request id: 9090a620-f5d8-4284-8e58-cf9af963df97
time="2018-09-10T13:14:44Z" level=error msg="error creating machine: cannot create EC2 instance: InvalidBlockDeviceMapping: Volume of size 100GB is smaller than  snapshot 'snap-06909941fa52515d0', expect size >= 200GB\n\tstatus code: 400, request id: 9090a620-f5d8-4284-8e58-cf9af963df97" controller=awsMachine machine=unified-hybrid-cloud/jhernand11-5fjvv-master-l69wb

Note that as there are no stable releases yet we are using our own tagged version of the project, commit a456201ca02808627f74c03b8a2d15c47f1a80a4. On top of that we have added a patch to use version 3.4.34 of the isntaller:

diff --git a/Makefile b/Makefile
index 3a9879c5..c461c722 100644
--- a/Makefile
+++ b/Makefile
@@ -378,7 +378,7 @@ cluster-operator-ansible-images: build/cluster-operator-ansible/Dockerfile build
        $(call build-cluster-operator-ansible-image,$(OA_ANSIBLE_URL),"release-3.9",$(CLUSTER_OPERATOR_ANSIBLE_IMAGE_NAME),"v3.9",$(CLUSTER_API_DEPLOYMENT_PLAYBOOK))

        # build v3.10 on openshift-ansible:master
-       $(call build-cluster-operator-ansible-image,$(OA_ANSIBLE_URL),"openshift-ansible-3.10.0-0.32.0",$(CLUSTER_OPERATOR_ANSIBLE_IMAGE_NAME),"v3.10",$(CLUSTER_API_DEPLOYMENT_PLAYBOOK))
+       $(call build-cluster-operator-ansible-image,$(OA_ANSIBLE_URL),"openshift-ansible-3.10.34-1",$(CLUSTER_OPERATOR_ANSIBLE_IMAGE_NAME),"v3.10",$(CLUSTER_API_DEPLOYMENT_PLAYBOOK))

        # build master/canary
        $(call build-cluster-operator-ansible-image,$(OA_ANSIBLE_URL),$(OA_ANSIBLE_BRANCH),$(CLUSTER_OPERATOR_ANSIBLE_IMAGE_NAME),$(VERSION),$(CLUSTER_API_DEPLOYMENT_PLAYBOOK))

Is this a known issue? Any suggestion on how to address it?

jhernand commented 6 years ago

CC: @kbsingh @zgalor @nimrodshn

jhernand commented 6 years ago

Apparently the issue is that the cluster operator is assuming volumes of 100 GiB:

https://github.com/openshift/cluster-operator/blob/02e99800deaf4b9c13a79c3f4ba949cca0fc0d66/pkg/clusterapi/aws/actuator.go#L286-L303

And the newer API uses volumes of 200 GiB.

Would it be possible for the cluster operator to take this from the description of the AMI?