Closed pli888 closed 3 years ago
However there is something weird with this PR: Github says it's made of 800 file changes. That doesn't sound right to me.
This PR is now fixed after rebasing with the latest changes in your fuw-cicd
branch.
Regarding the one for the bastion, may I suggest to change the database loading part so we can use the output of convert_production_db_to_latest_ver.sh directly?
I've updated the bastion playbook to search for the latest backup file in the gigadb/app/tools/files-url-updater/sql
directory and use this file for restoring the gigadb
database on the RDS instance.
I've rebased this PR with the latest changes in fuw-cicd
branch and moved the PostgreSQL client tools installation from bastion-aws-instance.tf
into bastion_playbook.yml
. The private_subnets
have also been commented out in terraform.tf
.
My terraform inventory list looks like this:
> ../../inventories/terraform-inventory.sh --list | jq
{
"all": {
"hosts": [
"16.162.161.219",
"16.162.189.6"
],
"vars": {
"ec2_bastion_public_ip": "16.162.161.219",
"ec2_private_ip": "10.99.0.241",
"ec2_public_ip": "16.162.189.6",
"rds_instance_address": "rds-server-staging.cfkc0cbc20ii.ap-east-1.rds.amazonaws.com"
}
},
"module_ec2_bastion_bastion": [
"16.162.161.219"
],
"module_ec2_bastion_bastion.0": [
"16.162.161.219"
],
"module_ec2_dockerhost_docker_host": [
"16.162.189.6"
],
"module_ec2_dockerhost_docker_host.0": [
"16.162.189.6"
],
"module_ec2_dockerhost_docker_host_eip": [
"16.162.189.6"
],
"module_ec2_dockerhost_docker_host_eip.0": [
"16.162.189.6"
],
"module_ec2_dockerhost_docker_host_eip_assoc": [
"16.162.189.6"
],
"module_ec2_dockerhost_docker_host_eip_assoc.0": [
"16.162.189.6"
],
"name_bastion_server_staging": [
"16.162.161.219"
],
"name_gigadb_server_staging_rija": [
"16.162.189.6"
],
"system_t3_micro-centos8": [
"16.162.161.219",
"16.162.189.6"
],
"type_aws_eip": [
"16.162.189.6"
],
"type_aws_eip_association": [
"16.162.189.6"
],
"type_aws_instance": [
"16.162.161.219",
"16.162.189.6"
]
}
Hi @pli888, how should the playbooks be run? I've tried 4 different ways:
I use ansible-playbook -i ../../inventories dockerhost_playbook.yml
but I think this won't work in your case because dockerhost_playbook.yml
won't pick up your EC2 server because of your AWS username at the end of name
tag value.
ansible-playbook -i ../../inventories dockerhost_playbook.yml
should work if you change the hosts
line in dockerhost_playbook.yml
to:
hosts: name_gigadb_server_staging*:name_gigadb_server_live*
Thanks @pli888, that worked and dockerhost_playbook.yml peformed ok. I've then applied the same change to the bastion_playbook.yml and it seems to have done the trick, however it fails in the play: "Test pg_isready can connect to RDS instance" with error:
TASK [Test pg_isready can connect to RDS instance] ***********************************************************************************************************************
fatal: [16.162.189.6]: FAILED! => {"changed": false, "msg": "no command given", "rc": 256}
When running in verbose mode, the output is: https://gist.github.com/rija/565170272e5e40bf904c3270320012aa
PLAY RECAP ***************************************************************************************************************************************************************
16.162.189.6 : ok=5 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Looks like PostgreSQL failed to install on bastion:
[...]$ /usr/pgsql-11/bin/pg_isready -h xxxxxxxxxxi.yyyyyy.rds.amazonaws.com
-bash: /usr/pgsql-11/bin/pg_isready: No such file or directory
I think there are various problems with the postgresql client install play.
To start with we should use dnf
instead of yum
everywhere.
Also dnf
needs sudo, so the become=yes
option needs to be present for all calls to dnf
module too.
$ git diff
...
tasks:
- name: Disable postgresql module in AppStream
command: dnf -qy module disable postgresql
- become: true
+ become: yes
- rpm_key:
state: present
key: https://download.postgresql.org/pub/repos/yum/RPM-GPG-KEY-PGDG
- name: Install PostgreSQL repo
- yum:
+ become: yes
+ dnf:
name: https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
state: present
Even after that, it still isn't working.
By ssh-ing into the bastion, I can manually install the tools by issuing: dnf install -y postgresql-server
But it's installing the wrong version (10), and from the appstream
repository instead of the postgresql repo and it doesn't work when using it in the Ansible play book (it's says there's nothing to do, even when I know the tools are not installed).
If I search for Postgresql package in specific repo of any version dnf --repo pgdg11 search postgresql
, the result doesn't seem to contain installable packages.
Interestingly, when i replace the call to dnf module in the task for installing postgresql with :
- name: Install PostgreSQL client packages
command: "dnf -y install postgresql-server"
become: yes
I got this warning:
"stdout": "Last metadata expiration check: 1:21:57 ago on Thu 14 Oct 2021 04:24:43 PM UTC.\nPackage postgresql11-server-11.13-1PGDG.rhel8.x86_64 is already installed.\nDependencies resolved.\nNothing to do.\nComplete!",
"stdout_lines": [
"Last metadata expiration check: 1:21:57 ago on Thu 14 Oct 2021 04:24:43 PM UTC.",
"Package postgresql11-server-11.13-1PGDG.rhel8.x86_64 is already installed.",
"Dependencies resolved.",
"Nothing to do.",
"Complete!"
]
but I can't see the installed packages on the bastion, and the next task that call pg_isready
fails because it cannot find the command either
@rija Unfortunately, I'm not able to replicate the pg_isready
error. I've double checked that dnf -qy module disable postgresql
Ansible step removes postgresql from AppStream. Before this step is executed then you will see this on the bastion server:
[centos@ip-10-99-0-74 ~]$ dnf list "postgresql*"
Last metadata expiration check: 0:43:27 ago on Fri 15 Oct 2021 03:24:12 AM UTC.
Available Packages
postgresql.x86_64 10.17-1.module_el8.4.0+823+f0dbe136 appstream
postgresql-contrib.x86_64 10.17-1.module_el8.4.0+823+f0dbe136 appstream
postgresql-docs.x86_64 10.17-1.module_el8.4.0+823+f0dbe136 appstream
postgresql-jdbc.noarch 42.2.24-1.rhel8 pgdg-common
postgresql-jdbc-javadoc.noarch 42.2.24-1.rhel8 pgdg-common
Then after the postgres
AppStream module is disabled then the output is this:
[centos@ip-10-99-0-74 ~]$ dnf list "postgresql*"
Last metadata expiration check: 0:43:05 ago on Fri 15 Oct 2021 03:24:12 AM UTC.
Available Packages
postgresql-jdbc.noarch 42.2.24-1.rhel8 pgdg-common
postgresql-jdbc-javadoc.noarch 42.2.24-1.rhel8 pgdg-common
postgresql-odbc.x86_64 10.03.0000-2.el8 appstream
postgresql-odbc-tests.x86_64 10.03.0000-2.el8 appstream
postgresql-unit10.x86_64 7.2-1.rhel8 pgdg10
This should allow Postgresql 11 to be installed from the PostgreSQL repository by Ansible so that I see its client tools installed on my bastion server:
[centos@ip-10-99-0-74 ~]$ cd /usr/pgsql-11/
[centos@ip-10-99-0-74 pgsql-11]$ ls
bin lib share
[centos@ip-10-99-0-74 pgsql-11]$ cd bin
[centos@ip-10-99-0-74 bin]$ pwd
/usr/pgsql-11/bin
[centos@ip-10-99-0-74 bin]$ ls
clusterdb dropdb pg_basebackup pg_dump pg_receivewal pg_test_fsync pg_waldump vacuumdb
createdb dropuser pgbench pg_dumpall pg_restore pg_test_timing psql
createuser pg_archivecleanup pg_config pg_isready pg_rewind pg_upgrade reindexdb
This then allows the pg_isready
Ansible step to work:
$ ansible-playbook -i ../../inventories bastion_playbook.yml
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Restore PostgreSQL database on RDS instance using pg_restore] ************************************************************
TASK [Gathering Facts] *********************************************************************************************************
ok: [18.162.41.222]
TASK [Disable postgresql module in AppStream] **********************************************************************************
[WARNING]: Consider using the dnf module rather than running 'dnf'. If you need to use command because dnf is insufficient you
can add 'warn: false' to this command task or set 'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [18.162.41.222]
TASK [rpm_key] *****************************************************************************************************************
ok: [18.162.41.222]
TASK [Install PostgreSQL repo] *************************************************************************************************
ok: [18.162.41.222]
TASK [Install PostgreSQL 11 client packages] ***********************************************************************************
changed: [18.162.41.222]
TASK [Test pg_isready can connect to RDS instance] *****************************************************************************
changed: [18.162.41.222]
TASK [debug] *******************************************************************************************************************
ok: [18.162.41.222] => {
"msg": "rds-server-staging.********20ii.ap-east-1.rds.amazonaws.com:5432 - accepting connections"
}
TASK [Get files in files-url-updater sql folder] *******************************************************************************
fatal: [18.162.41.222]: FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
PLAY RECAP *********************************************************************************************************************
18.162.41.222 : ok=7 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
The error I get with the Get files in files-url-updater sql folder
task, I work around by executing:
$ ansible-playbook --ask-become-pass -i ../../inventories bastion_playbook.yml
I've made the changes you suggested with using dnf
instead of yum
so there are a few new commits in this PR now.
Hi @pli888,
Thanks for looking into it. I've started from scratch after pulling the latest changes.
It still fails at the pg_ready
task:
TASK [Install PostgreSQL repo] *******************************************************************************************************************************************
changed: [18.163.46.93]
TASK [Install PostgreSQL 11 client packages] *****************************************************************************************************************************
changed: [18.163.46.93]
TASK [Test pg_isready can connect to RDS instance] ***********************************************************************************************************************
fatal: [18.163.46.93]: FAILED! => {"changed": false, "msg": "no command given", "rc": 256}
All the commands you've shown in your comment, I've run them and I have the same output as you (which is progress as yesterday I didn't get the /usr/pgsql-11/
to exist). So I need to figure out why that commands fails.
The verbose output is not much more helpful at first glance. I wonder what the verbose output would say if it were successful:
fatal: [18.163.46.93]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"_raw_params": null,
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"msg": "no command given",
"rc": 256
}
Manually running the command on bastion works:
[centos@ip-10-99-0-76 ~]$ /usr/pgsql-11/bin/pg_isready -h rds-server-staging.cfkc0cbc20ii.ap-east-1.rds.amazonaws.com
rds-server-staging.cfkc0cbc20ii.ap-east-1.rds.amazonaws.com:5432 - accepting connections
So it seems it is an Ansible thing. My version is:
> ansible-playbook --version
ansible-playbook 2.9.9
config file = None
configured module search path = ['/Users/rijamenage/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/2.9.9/libexec/lib/python3.8/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.8.3 (default, May 27 2020, 20:54:22) [Clang 11.0.3 (clang-1103.0.32.59)]
Upgrading Ansible fixed the issue with pg_ready for me
> ansible-playbook --version
ansible-playbook [core 2.11.6]
config file = None
configured module search path = ['/Users/rijamenage/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible
ansible collection location = /Users/rijamenage/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible-playbook
python version = 3.9.7 (default, Oct 13 2021, 06:44:56) [Clang 12.0.0 (clang-1200.0.32.29)]
jinja version = 3.0.2
libyaml = True
The problem I'm running into now is when looking for the output of files-url-updater. It never asked for a password (btw, there's actually no reason it needs sudo for the action in question) and instead fails immediately:
TASK [Get files in files-url-updater sql folder] ****************************************************************************************************************************
task path: /Users/Shared/pli888-gigadb-website/ops/infrastructure/envs/staging/bastion_playbook.yml:35
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: rijamenage
<localhost> EXEC /bin/sh -c 'echo ~rijamenage && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/rijamenage/.ansible/tmp `"&& mkdir "` echo /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052 `" && echo ansible-tmp-1634297269.269126-85779-103997721710052="` echo /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052 `" ) && sleep 0'
Using module file /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible/modules/find.py
<localhost> PUT /Users/rijamenage/.ansible/tmp/ansible-local-85677_n4udhde/tmpomoc5i5z TO /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052/AnsiballZ_find.py
<localhost> EXEC /bin/sh -c 'chmod u+x /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052/ /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052/AnsiballZ_find.py && sleep 0'
<localhost> EXEC /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-bztqrexbgfmvezafvxbdzyayszuqvdme ; /usr/local/Cellar/ansible/4.7.0/libexec/bin/python3.9 /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052/AnsiballZ_find.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /Users/rijamenage/.ansible/tmp/ansible-tmp-1634297269.269126-85779-103997721710052/ > /dev/null 2>&1 && sleep 0'
fatal: [18.163.46.93 -> localhost]: FAILED! => {
"changed": false,
"module_stderr": "sudo: a password is required\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 1
}
I think the issue is that these two tasks are local tasks in the midst of a remote playbook
I've tried many variations on delegate_to
and local_action
and become
with no avail.
I could get those two tasks to succeed (ant the rest of the bastion playbook) by running the entire playbook with sudo, which doesn't feel right.
An alternative approach would be to get rid of these two local tasks and prompt the user in tf_init.sh
for a backup file path, which will have the backup file location written to ansible.properties
and then we add the corresponding variable in ops/infrastructures/inventories/hosts
file in the [all:vars]
section.
Added benefit that way is the user has more flexibility to choose a backup file (e.g: because the latest source backup is corrupted).
An alternative approach would be to get rid of these two local tasks and prompt the user in tf_init.sh for a backup file path, which will have the backup file location written to ansible.properties and then we add the corresponding variable in ops/infrastructures/inventories/hosts file in the [all:vars] section.
The above has been implemented in the 1471b9c commit now for this PR. So now, on running tf_init.sh
, you will see the following prompt below (with my response):
You need to specify a backup file created by the files-url-updater tool: gigadbv3_20210929_v9.3.25.backup
Finally, I've added the AWS username to the end of the name tag of the bastion server. This helps to distinguish multiple bastion servers and I'm sure it's not allowed for EC2 servers to have identical name tags.
Hi @pli888,
I've got new error that happened during terraform apply
:
module.ec2_dockerhost.aws_eip_association.docker_host_eip_assoc: Still creating... [10s elapsed]
module.ec2_dockerhost.aws_eip_association.docker_host_eip_assoc: Creation complete after 12s [id=eipassoc-0fa7d25e87ed71876]
Error: Error creating DB Instance: DBInstanceAlreadyExists: DB instance already exists
status code: 400, request id: 266df3e2-5cd6-4ceb-a3ba-550d6f74fb91
on .terraform/modules/rds.db/modules/db_instance/main.tf line 20, in resource "aws_db_instance" "this":
20: resource "aws_db_instance" "this" {
The error is thrown by Terraform public module for aws-rds. I've checked, I've no existing RDS deployed already.
could it be tag clash (I noticed you have an RDS instance named rds-server-staging
) and that tag
will be generated for me in rds-instance.tf
module "db" {
source = "terraform-aws-modules/rds/aws"
# Only lowercase alphanumeric characters and hyphens allowed in "identifier"
identifier = "rds-server-${var.deployment_target}"
the fix is probably to add the Iam role username as suffix there.
Hi @pli888, the following patch made it work for me:
diff --git a/ops/infrastructure/getIAMUserNameToJSON.sh b/ops/infrastructure/getIAMUserNameToJSON.sh
index 48fa64f4d..df8c3dbc9 100755
--- a/ops/infrastructure/getIAMUserNameToJSON.sh
+++ b/ops/infrastructure/getIAMUserNameToJSON.sh
@@ -6,5 +6,5 @@
# the output has to be valid JSON
set -e
-userName=$(aws sts get-caller-identity --output text --query Arn | cut -d"/" -f2)
+userName=$(aws sts get-caller-identity --output text --query Arn | cut -d"/" -f2 | tr '[:upper:]' '[:lower:]')
jq -n --arg userName "$userName" '{"userName":$userName}'
\ No newline at end of file
diff --git a/ops/infrastructure/modules/rds-instance/rds-instance.tf b/ops/infrastructure/modules/rds-instance/rds-instance.tf
index e62f11e60..6b2a101f7 100644
--- a/ops/infrastructure/modules/rds-instance/rds-instance.tf
+++ b/ops/infrastructure/modules/rds-instance/rds-instance.tf
@@ -21,7 +21,7 @@ module "db" {
source = "terraform-aws-modules/rds/aws"
# Only lowercase alphanumeric characters and hyphens allowed in "identifier"
- identifier = "rds-server-${var.deployment_target}"
+ identifier = "rds-server-${var.deployment_target}-${var.owner}"
create_db_option_group = false
create_db_parameter_group = false
@pli888,
In tf_init.sh
, it's better to write backup_file
's value in .init_env_vars
,
and then in ansible_init.sh
, write backup_file
's value to ansible.properties
like all the other variables,
instead of appending it to ansible.properties
from tf_init.sh
.
Otherwise, when we run tf_init.sh
multiple times (e.g: because we are debugging something), the variables will be appended multiple time
in ansible.properties
and that makes Ansible crash because it doesn't like duplicate values in ansible.properties
Additionally, by using .init_env_vars
, we avoid being prompted for the backup file location every time we call tf_init.sh
if that location hasn't changed.
@rija I've made the changes you suggested now. The RDS server now has a name tag which ends with the user's IAM username. The backup_file
variable is now first written into . init_env_vars
by tf_init.sh
and then by ansible_init.sh
into the ansible.properties
file.
@rija I noticed the change in getIAMUserNameToJSON.sh
:
-userName=$(aws sts get-caller-identity --output text --query Arn | cut -d"/" -f2)
+userName=$(aws sts get-caller-identity --output text --query Arn | cut -d"/" -f2 | tr '[:upper:]' '[:lower:]')
I was wondering when you execute terraform destroy
, do you get errors with deleting your AWS resources?
@pli888, yes,
I figured the reason i think: it's because the AWS policies check on the owner tag in a case sensitive way.
I've updated the policies to use StringEqualsIgnoreCase
when StringEquals
was used to check on ${aws:username}
and I can now destroy all my resources
Hi @pli888,
minor issue I just found in tf_init.sh
:
93 echo "backup_file=../../../../gigadb/app/tools/files-url-updater/sql/$backup_file" >> .init_env_vars
It's better to ask the user for the full path instead of just the file name, otherwise when we run tf_init.sh
multiple times,
$backup_file
's value will be modified with the same prefix added to the already saved full path, which leads to an increasingly long and incorrect value.
@rija Ok, I've made a note to make changes to the code to prompt user for the full path to the backup file when I continue work on my rds restore from snapshot and backup branch. I will also update policy-ec2.md
and policy-rds.md
too.
@pli888
sounds good to me
@kencho51
Let me know when this PR is working for you and I'll then merge it to fuw-cicd
Pull request for issues: #733, #735, #786
This is a pull request for the following functionalities:
Changes to Terraform
The root Terraform file at
ops/terraform.tf
has been updated with a custom VPC containing public, private and database subnets using the AWS VPC Terraform module. All AWS resources defined as modules in thisterraform.tf
reside in this custom VPC. For example, the existingdocker_host
module is located on a public subnet within the VPC. In addition,terraform.tf
now contains two new modules:rds
moduleThe
rds
module defines an AWS RDS instance which provides a PostgreSQL RDBMS located on a VPC database subnet. It uses the AWS Terraform RDS module to configure a PostgreSQL RDBMS version 9.6 running on at3.micro
RDS instance. Its AWS security group allows only internal VPC clients to connect to it.ec2_bastion
moduleThe
ec2_bastion
module defines an EC2 instance running Centos 8 which provides administrative access to the RDS instance. It is located in a public subnet on the custom VPC and its security group allows connections to it from all public IP addresses. It is expected that bastion server users will destroy this bastion instance when database administrative tasks have been done.To enable the bastion server to perform its RDBMS tasks, it is provisioned with PostgreSQL 11 client tools from within the
bastion-aws-instance.tf
.Changes to Ansible
When the
scripts/ansible_init.sh
is executed to prepare an environment directory, this script updates the Gitlabgigadb_db_host
variable with the host domain of the RDS instance.The original
playbook.yml
has been renamed todockerhost_playbook.yml
to signify that it executes against thedocker_host
EC2 instance. The major change in this playbook is that the references topostgres-preinstall
andpostgres-postinstall
have been deleted since these roles have been removed due to the use of the RDS service.There is a new
bastion_playbook.yml
which is executes against the bastion EC2 server. Firstly, the playbook checks it can connect to the RDS instance and then restores agigadb
database on it using thesql/production_like.pgdmp
file that is generated as a product of running./up.sh
.Changes to documentation
SETUP_CI_CD_PIPELINE.md
has been updated with RDS specific information.Procedure for deploying GigaDB application with RDS service
Prerequisites
sql/production_like.pgdmp
file that is created by./up.sh
eip-ape1-staging-Peter-gigadb
Steps
build_staging
step on Gitlab CI/CD pipelinesd_gigadb
step on Gitlab CI/CD pipelineIf you browse the GigaDB website on your staging server, you should see that the static web pages are displayed but there are error messages viewing the dataset pages probably due to the
dropcontraints
anddropindexes
database migration steps executed by thegigadb-deploy-jobs.yml
. To fix this problem, we restore agigadb
database using production data on the RDS instance:To get terraform to destroy bastion server: