Open cben opened 6 years ago
Can you check out the logs for your aws-machine-controller pod in openshift-cluter-operator namespace, on the "root" cluster-operator cluster. (this is where the masters are currently created)
@dgoodwin @cben
nshneor@dhcp-2-169 ~/workspace/go/src/github.com/openshift/cluster-operator (master) $ oc project openshift-cluster-operator
Now using project "openshift-cluster-operator" on server "https://127.0.0.1:8443".
nshneor@dhcp-2-169 ~/workspace/go/src/github.com/openshift/cluster-operator (master) $ oc get pods
NAME READY STATUS RESTARTS AGE
aws-machine-controller-1-gjv4b 1/1 Running 0 26m
cluster-api-controller-manager-7dddc65c96-4z7px 1/1 Running 0 30m
cluster-operator-apiserver-1-6pcgz 2/2 Running 0 26m
cluster-operator-controller-manager-1-tz9cb 1/1 Running 0 26m
playbook-mock-6bf5c6f9d6-mnhms 1/1 Running 0 26m
nshneor@dhcp-2-169 ~/workspace/go/src/github.com/openshift/cluster-operator (master) $ oc log aws-machine-controller-1-gjv4b
W0711 13:48:13.786632 14837 cmd.go:358] log is DEPRECATED and will be removed in a future version. Use logs instead.
ERROR: logging before flag.Parse: W0711 10:21:31.525882 1 controller.go:64] environment variable NODE_NAME is not set, this controller will not protect against deleting its own machine
ERROR: logging before flag.Parse: E0711 10:21:31.830068 1 reflector.go:205] github.com/openshift/cluster-operator/vendor/sigs.k8s.io/cluster-api/pkg/controller/sharedinformers/zz_generated.api.register.go:57: Failed to list *v1alpha1.MachineSet: the server could not find the requested resource (get machinesets.cluster.k8s.io)
ERROR: logging before flag.Parse: E0711 10:21:31.830457 1 reflector.go:205] github.com/openshift/cluster-operator/vendor/sigs.k8s.io/cluster-api/pkg/controller/sharedinformers/zz_generated.api.register.go:56: Failed to list *v1alpha1.MachineDeployment: the server could not find the requested resource (get machinedeployments.cluster.k8s.io)
ERROR: logging before flag.Parse: E0711 10:21:31.831315 1 reflector.go:205] github.com/openshift/cluster-operator/vendor/sigs.k8s.io/cluster-api/pkg/controller/sharedinformers/zz_generated.api.register.go:55: Failed to list *v1alpha1.Machine: the server could not find the requested resource (get machines.cluster.k8s.io)
...
...
ERROR: logging before flag.Parse: I0711 10:39:20.655436 1 controller.go:91] Running reconcile Machine for nshneor-zxq8g-master-b8b5n
time="2018-07-11T10:39:20Z" level=debug msg="checking if machine exists" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
time="2018-07-11T10:39:27Z" level=debug msg="instance does not exist" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
ERROR: logging before flag.Parse: I0711 10:39:27.125829 1 controller.go:134] reconciling machine object nshneor-zxq8g-master-b8b5n triggers idempotent create.
time="2018-07-11T10:39:27Z" level=info msg="creating machine" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
time="2018-07-11T10:39:27Z" level=debug msg="Obtaining EC2 client for region \"us-east-1\"" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
time="2018-07-11T10:39:27Z" level=debug msg="Describing AMI ami-0dd8ad483cef75c18" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
time="2018-07-11T10:39:27Z" level=error msg="error creating machine: Unexpected number of images returned: 0" controller=awsMachine machine=myproject/nshneor-zxq8g-master-b8b5n
This indicates it cannot find the AMI configured in your cluster version. If you're just using our direct development playbooks, you'll see we loaded up a cluster version pointing to an AMI that is only available in our rh-dev account. I'm wondering if you guys are using a different AWS account?
If so you need to create your own cluster version, see oc get clusterversions -o yaml for what they look like, or this link for what we create them from: https://github.com/openshift/cluster-operator/blob/master/contrib/examples/cluster-versions-template.yaml
I don't think the AWS key is your issue at this point, pretty sure it's something else.
yep we're using different AWS account. Thanks, will look into it!
You kind of need a "golden" image, we've been building our own (@abutcher has) for development which is what you see in our clusterversion by default.
I just cc'd you on an email, trying to track down where or what you could use on another account.
I've been building our AMIs with https://github.com/openshift/openshift-ansible/blob/master/playbooks/aws/openshift-cluster/build_ami.yml.
@abutcher Thanks! The READMEs under https://github.com/openshift/openshift-ansible/tree/master/playbooks/aws are pretty great, except for one Catch 22:
"A base AMI is required for AMI building. Please ensure `openshift_aws_base_ami` is defined."
I can't find any explanation in openshift-ansible repo what is expected from a "base AMI" and where do I find one :confused: (I'm really new to AWS and have zero idea what I'm doing :smile:)
Gonna try a CentOS image...
Tried a centos 7 AMI from centos wiki as base_ami, got:
Instance creation failed => AuthFailure: Not authorized for images: [ami-4bf3d731]
My base ami is openshift_aws_base_ami: ami-b81dbfc5
which is a Centos 7 AMI on the marketplace.
Thanks!! I also had to click Subscribe and accept terms on Marketplace and then I was able to use it.
[x] contacting centos list to suggest improvements https://wiki.centos.org/Cloud/AWS
[ ] I'll send an openshift-ansible docs PR about "base image".
[ ] Setting openshift_aws_build_ami_ssh_user: centos
led to permission errors ("Destination /etc/pki/rpm-gpg not writable"...). Worked with become, become_user, and become_method. Checking whether I need to modify the playbooks or just document existing options... (g_sudo
looks promising?)
3.10 doesn't build yet with CentOS because of missing origin-docker-excluder-3.10
package:
https://github.com/openshift/openshift-ansible/issues/7794. Checking workarounds suggested there...
[ ] Document in this repo that deploy-devel-playbook.yml
uses contrib/examples/cluster-versions-template.yaml
defaulting to non-public AMI. Link to how build your own image. Document or add param to choose your own AMI.
[STATUS: I'm taking a break from this, but I still intend to get the whole process tested & documented eventually]
Hi. We (cc @nimrodshn) are trying out cluster-operator according to README, in fake=false mode. We got
MachineSet
andMachine
objects being created but we don't get any AWS instances.Machine
status remains at:Looking at pod logs it seems AWS credentials didn't make it into openshift-ansible:
The pod has AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY set:
The secrets do exist:
How can we troubleshoot it further?