This repository contains scripts to configure a DC/OS cluster on Google Compute Engine.
A bootstrap node is required to run the scripts and to bootstrap the DC/OS cluster.
PLEASE READ THE ENTIRE DOCUMENT. YOU MUST MAKE CHANGES FOR THE SCRIPTS TO WORK IN YOUR GCE ENVIRONMENT.
YOU MUST CREATE A PROJECT using the google cloud console. The author created a project called trek-treckr
You can create the bootstrap node using the google cloud console. The author used a n1-standard-1 instance running centos 7 with a 10 GB persistent disk in zone europe-west1-c. The bootstrap node must have "Allow full access to all Cloud APIs" in the Identity and API access section. Also enable Block project-wide SSH keys in the SSH Keys section. Create the instance.
After creating the boot instance run the following from the shell
sudo yum update google-cloud-sdk
sudo yum update
sudo yum install epel-release
sudo yum install python-pip
sudo pip install -U pip
sudo pip install 'apache-libcloud==1.2.1'
sudo pip install 'docker-py==1.9.0'
sudo yum install git-1.8.3.1 ansible-2.1.1.0
You need to create the rsa public/private keypairs to allow passwordless logins via SSH to the nodes of the DC/OS cluster. This is required by ansible to create the cluster nodes and install DC/OS on the nodes.
Run the following to generate the keys
ssh-keygen -t rsa -f ~/.ssh/id_rsa -C ajazam
PLEASE REPLACE ajazam with your username. Do not enter a password when prompted
Make a backup copy of id_rsa.
Open rsa pub key
sudo vi ~/.ssh/id_rsa.pub
shows
ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam
Prefix your username, followed by a colon, to the above line. Also replace ajazam at the end with your username.
ajazam:ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam
save contents of id_rsa.pub. Please replace the ajazam with your username.
Add the rsa public key to your project
chmod 400 ~/.ssh/id_rsa
gcloud compute project-info add-metadata --metadata-from-file sshKeys=~/.ssh/id_rsa.pub
Disable selinux for docker to work
make the following change to /etc/selinux/config
SELINUX=disabled
reboot host
To install docker add the yum repo
sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
install the docker package
sudo yum install docker-engine-1.11.2
Add following changes to /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/docker daemon --storage-driver=overlay
reload systemd
sudo systemctl daemon-reload
Start docker
sudo systemctl start docker.service
Verify if docker works
sudo docker run hello-world
download the dcos-gce scripts
git clone https://github.com/dcos-labs/dcos-gce
change directory
cd dcos-gce
Please make appropriate changes to group_vars/all. You need to review project, subnet, login_name, bootstrap_public_ip & zone
insert following into ~/.ansible.cfg to stop host key checking
[defaults]
host_key_checking = False
[paramiko_connection]
record_host_keys = False
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null
Ensure the IP address for master0 in ./hosts is the next consecutive IP from bootstrap_public_ip.
To create and configure the master nodes run
ansible-playbook -i hosts install.yml
To create and configure the private nodes run
ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0001 end_id=0002 agent_type=private"
start_id=0001 and end_id=0002 specify the range of id's that are appended to the hostname "agent" to create unique agent names. If start_id is not specified then a default of 0001 is used. If the end_id is not specified then a default of 0001 is used.
When specifying start_id or end_id via CLI, the leading zeroes must be dropped for any agent id higher than 7 or ansible will throw a format error.
ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0006 end_id=10 agent_type=private"
The values for agent_type are either private or public. If an agent_type is not specified then it is assumed agent_type is private.
To create public nodes type
ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0003 end_id=0004 agent_type=public"
File './hosts' is an ansible inventory file. Text wrapped by [] represents a group name and individual entries after the group name represent hosts in that group. The [masters] group contains node names and IP addresses for the master nodes. In the supplied file the host name is master0 and the ip address 10.132.0.3 is assigned to master0. YOU MUST CHANGE the IP address for master0 for your network. You can create multiple entries e.g. master1, master2 etc. Each node must have a unique IP address.
The [agents] group has one entry. It specifies the names of all the agents one can have in the DC/OS cluster. The value specifies that agent0000 to agent9999, a total of 10,000 agents are allowed. This really is an artificial limit because it can easily be changed.
The [bootstrap] group has the name of the bootstrap node.
File './group_vars/all' contains miscellaneous parameters that will change the behaviour of the installation scripts. The parameters are split into two groups. Group 1 parameters must be changed to reflect your environment. Group 2 parameters can optionally be changed to change the behaviour of the scripts.
project
Your project id. Default: trek-trackr
subnet
Your network. Default: default
login_name
The login name used for accessing each GCE instance. Default: ajazam
bootstrap_public_ip
The bootstrap nodes public IP. Default: 10.132.0.2
zone
You may change this to your preferred zone. Default: europe-west1-c
master_boot_disk_size:
The size of the master node boot disk. Default 10 GB
master_machine_type
The GCE instance type used for the master nodes. Default: n1-standard-2
master_boot_disk_type
The master boot disk type. Default: pd-standard
agent_boot_disk_size
The size of the agent boot disk. Default 10 GB
agent_machine_type
The GCE instance type used for the agent nodes. Default: n1-standard-2
agent_boot_disk_type
The agent boot disk type. Default: pd-standard
agent_instance_type
Allows agents to be preemptible. If the value is "MIGRATE" then they are not preemptible. If the value is '"TERMINATE" --preemptible' then the instance is preemptible. Default: "MIGRATE"
agent_type
Can specify whether an agent is "public" or "private". Default: "private"
start_id
The number appended to the text agent is used to define the hostname of the first agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001
end_id
The number appended to the text agent is used to define the hostname of the last agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001
gcloudbin
The location of the gcloudbin binary. Default: /usr/local/bin/gcloud
image
The disk image used on the master and agent. Default: /centos-cloud/centos-7-v20161027
bootstrap_public_port
The port on the bootstrap node which is used to fetch the dcos installer from each of the master and agent nodes. Default: 8080
cluster_name
The name of the DC/OS cluster. Default: cluster_name
scopes
Don't change this. Required by the google cloud SDK
dcos_installer_filename
The filename for the DC/OS installer. Default dcos_generate_config.sh
dcos_installer_download_path
The location of where the dcos installer is available from dcos.io. Default: https://downloads.dcos.io/dcos/stable/{{ dcos_installer_filename }} The value of {{ dcos_installer_file }} is described above.
home_directory
The home directory for your logins. Default: /home/{{ login_name }} The value of {{ login_name }} is described above.
downloads_from_bootstrap
The concurrent downloads of the dcos installer to the cluster of master and agent nodes. You may need to experiment with this to get the best performance. The performance will be a function of the machine type used for the bootstrap node. Default: 2
dcos_bootstrap_container
Holds the name of the dcos bootstrap container running on the bootstrap node. Default: dcosinstaller