rija / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
1 stars 1 forks source link

Improve pipeline speed and fix root certificate expiration issue #198

Closed rija closed 3 years ago

rija commented 3 years ago

Fix fallout of Let's Encrypt root an intermediate certificate expiration

Let's Encrypt DST Root CA X3 certificate expired on 30 September 2021. Normally TLS libraries should have switched automatically to the existing valid replacement. However the version of OpenSSL (1.1.0l) in Debian 9 (Stretch) had a bug preventing switching to the new certificate. This caused composer to fail to connect. The fix is to migrate our PHP containers' base image from Debian 9 (Stretch) to Debian 10 (Buster). The later has version 1.1.1d of OpenSSL which doesn't present that buggy behaviour. However moving to major new version of base image is not trivial a few stuff broke and needed to be fixed. ALL breakage were fixed except for css_check which I've commented out for now until we figure why it doesn't work. Client libraries for PostgreSQL 9.6 are no longer available in Debian 10, so I pulled version 11 of PostgreSQL client libraries. There doesn't appear to be problem with connecting to PostgreSQL 9.6 server using that version.

The composer error can be replicated with:

> docker-compose exec application curl https://asset-packagist.org
curl: (60) SSL certificate problem: certificate has expired

Another problem is Ansible playbook. The following error was thrown when running a playbook that use the yum module:

 Request failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618)>

The temporary fix would have been to disable certs validation as an option to the Ansible yum module call, but that would reduce the security of our systems. The proper fix is to upgrade Centos used as the OS on the EC2 instances from Centos 7 to Centos 8.4. That required changes:

Improve caching in GitLab jobs and Docker build to speed up GitLab jobs

We have caching already for our custom images. Several other techniques are now applied (or restored):

Local cache of base image from Docker Hub

the base image used in our containers are pulled from Docker Hub, but because it's GitLab job is isolated and create its own instance of docker-dind, the base are never available locally when a docker build command is triggered, requring pulling them for each job.

What we do is create a new preliminary GitLab stage (.pre), where we have a job to pull once all the base image we use in the project. We then save them as a TAR file archive. We then use GitLab artifacts functionality to make those files available to all subsequents stages. In the jobs that need to build container images, we front the jobs' steps with a few line to load the TAR archive as local docker images so the build process doesn't need to pull them remotely.

Authenticated login to Docker Hub

Until now, all the pull to Docker Hub are anonymous but they have rate limits for the number of pulls per period of time and these rates are different for anonymous users, for logged in free user and for paying users. By logging with our Docker Hub account (which has to be set in GitLab variables) we increase our pull capacity.

Caching of composer libraries

Since we have locked version with composer.lock, the vendor library can be cached between jobs. To do so, we use the GitLab cache functionality that make a list of paths (in our case Composer files and directories) available across jobs, stages and pipelines of the same project.

We had that configuration before, but it disappeared, so we restore it.

Fixed precise versions of base image

Until now, we tend to use latest or x.y (major version) as image tag when specifying base image for our Dockerfile. The problem is that those container image can be updated whenever a minor version is released causing our cached image to be invalidated and trigger their pull and rebuild of our custom images. Additionally, we don't have certainty on which version this loose tags base images are at, as the upgrade is not audited on our side.

Instead, we use precise tag for the base image (x.y.z) so to remove any chance for the base image to change. We will manage upgrade of our infrastructure ourselves with our own auditing.

Comment out docker pull and build instructions related to FUW

the container services associate with File Upload Wizard are not deployed on production environment, so there's no need to pull and build their images.

Fix tag for custom images

The production images built in the build job are tagged with the environment they are for. Unfortunately, when they were pulled before the build the wrong tag (latest instead of stagging or live) was used, causing the docker build to think there is no image so it build entirely new production images again. An associatated bug was the absence of the environment specific tag in docker-compose-build.yml where we define what image to use as cache.

Move transcient docker instructions at the end of the Dockefile

In the Production-Web-Dockerfile, the block RUN apk ... was triggered every time causing extra compile time becuase previous layer is constantly invalidated by definition (the block creating site config which never persist). By moving the RUN apk ... before the Site config block we enable the compilation stage to be cached.

Tests in CI

For the last few weeks we've made a lot of change to the infrastructure without running the test suites in CI, when I reactivated the test job, it failed mostly because of those changes, so the test job had to be adapted. The main change is that we now use the up.sh project setup script on CI test job as well. I've also added an block to functional tests in protected/tests/phpunit.xml to exclude flaky tests and those related to File Upload Wizard.

The other change is that Composer installed binaries like phpunit, beat, phpcov need to be fully referenced because on CI we don't have the bin/ symlink in project directory.

Finally the main gigadb test suite CI job is now re-instated.

Bug fix

Fix the pg_dump command in convert_production_db_to_latest_ver.sh to use the version used by @pli888

TODO:

pli888 commented 3 years ago

Line 72 in aws-instance.tf should be:

System = "t3_micro-centos8",
rija commented 3 years ago

t3_micro-centos8

Thanks @pli888. I've made that change