stephanosbacon / openshift-on-aws

Setting Up A simple OpenShift Origin Cluster on AWS
2 stars 2 forks source link

A simple OpenShift Origin Cluster On AWS

Cookbook Instructions

This repo contains a cloud formation template and ansible scripts for setting up a simple 3-node (1 master, 2 app nodes) OpenShift Origin cluster on AWS. This started as a short exercise so I could have a cluster at my disposal on which to try things for which minishift on my laptop would not be sufficient. In the process, I learned a few things that I thought I'd capture.

To give credit its due, I started by reading this excellent blog post that describes how to set up a 3.6 cluster. A lot has changed between 3.6 and 3.9, so getting things working required changes to the cloud formation template, as well as to the inventory file. More on that later.

Another resource that is very informative is this github repo that has a more sophisticated (and better automated) openshift-on-aws setup. It relies on ansible to set up the aws environment, as well as the openshift deployment.

Finally - for quick reference, here's a link to the openshift origin doc for configuring for aws.

In order to run these scripts, you will need an AWS account, AWS access keys, as well as a key pair. In order to generate access keys, follow the instructions here. Access keys are used for programmatic access (e.g. when you run aws cloudformation). In order to generate an EC2 Key Pair, follow the instructions here. You'll associate the key pair with your instances so you can SSH in.

You'll need to set the following environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

You'll also need to install the AWS cli, and call "aws configure" to set yourself up to run. The doc for that is here

Next, git clone this repo, cd into the directory and

git clone https://github.com/openshift/openshift-ansible.git
git checkout release-3.9

Having done all that, you will then need to upload the cloudformation template to an S3 bucket,

aws s3 cp CloudFormationTemplateOpenShift.yaml s3://<your s3 bucket and filename>

and then type the following incantation (substituting your information where you see angle brackets). If you want to deploy Gluster CNS, change the ParameterValue of the DeployGluster parameter to true. This will create three addtional nodes, as well as the necessary volumes for the CNS cluster.

aws cloudformation create-stack --region us-east-1  \
                                --stack-name <your stack name> \
                                --template-url https://s3.amazonaws.com/<your s3 bucket and filename> \
                                -parameters ParameterKey=AvailabilityZone,ParameterValue=us-east-1e \
                                ParameterKey=DeployGluster,ParameterValue=false \
                                ParameterKey=KeyName,ParameterValue=<name of your key pair .pem file> --capabilities=CAPABILITY_IAM

As it stands, things are hard-coded to run in the us-east-1e zone. The ami id is also hard coded (It is the official CentOS 7.4 AMI)

Once the stack is set up, first, edit hosts, replacing the DNS names of the machines with the ones that you just created. If you created the hosts for gluser CNS in the cloud formation template, then uncomment the glusterfs references in the hosts file.

An alternative to editing the hosts file is to get the ansible dynamic inventory script, ec2.py, as well as the associated ec2.ini file and run the following command:

eval cat hosts $(./ec2.py | ./substitute_hostnames.js) > inventory

Then run the following in sequence of commands, using ./inventory instead of ./hosts as the inventory file:

ansible-playbook prepare.yml -i ./hosts --key-file <your keypair>.pem

ansible-playbook -i hosts openshift-ansible/playbooks/prerequisites.yml --key-file <your keypair>.pem

ansible-playbook -i hosts openshift-ansible/playbooks/deploy_cluster.yml --key-file <your keypair>.pem

Once that is done, ssh into the master and type

sudo htpasswd /etc/origin/master/htpasswd <Your password>
oc adm policy add-cluster-role-to-user cluster-admin admin

And you should be in business: go to your master node public DNS at port 8443 and start being shifty!

A Couple of Handy Tidbits

Things Learned Along the Way

Opt For The Containerized Install

I inititially tried the RPM install which is the default, but it appears that origin builds container images more frequently than RPMs - the RPMs that got pulled down were labeled as alpha and still had bugs that prevented the install from working.

Changes to security groups

In the CloudFormationTemplate from the original blog post the authors set up 2 security groups - one for the master and one for the nodes, allowing ingress from any node in the VPC - or so it seemed, looking at the rules. This led to a silent failure in the openshift SDN starting up - silent in the sense that there was no error in the log, but nodes wouldn't start up because the SDN wasn't set up. This was fixed by creating a self referencing security group that allows access from any node in the security group and associating that with the master and all the nodes, as well having the separate master and node security groups. See the "clusterchatter" security group in the cloud formation template.

Labeling Nodes

I didn't see this documented anywhere, but apparently as of openshift 3.7, one has to label all the AWS nodes (excerpt from the template):

        - Key: kubernetes.io/cluster/openshift-on-aws
          Value: openshift-on-aws

The "openshift-on-aws" part can be whatever you want, but I think it has to match. If you don't do this, the deploy_cluster.yml playbook will fail complaining about the lack of labeling.

Setting osm_etcd_image

I think this is a bug, but what I found was that the install failed when trying to check for the existence of the etcd image unless I have the following in the inventory file:

# This is needed because by default the installer registry.access.redhat.com
# even if the deployment type is origin:
# https://github.com/openshift/openshift-ansible/issues/7808
#
osm_etcd_image=registry.redhat.io/rhel7/etcd

Stopping and Restarting AWS Nodes in this setup

Likely obvious, but if you set up your cluster, and then stop the instances (e.g. to save some money when you're not using them), when they restart, they will get new public IPs and hostnames so certificates will not be valid, and the openshift master config will have a bunch of references to the old public ip.

There is no supported (or even recommended) way to fix this. Changing the master config and redeploying certificates (i.e. running the redeploy_certificates playbook) will make it possible for the oc command line to work, but getting the console working after that change has been elusive.

Using an elastic IP for the master should work, but I haven't done it yet.