Right now we face an issue with fetching envs from SSM:
We have to fetch envs from SSM in the post deploy step because otherwise EB overwrites the env with the ones we set on the console
On convict, we have put the params as required string
However, doing this means whenever we change an env, EB will perform a re-build. However, since this is a re-build, it won't execute the deploy scripts which means the envs gets overriden with the ones from EB console that is missing the SSM params.
Closes IS-412
Solution
This solution introduces a few changes:
We move ALL of our envs to SSM except the SSM_PREFIX which is required to determine whether it is staging or prod
We move the fetching of envs from post-deploy previously to now pre-deploy
We need to keep a list of our envs in the new script for us to iterate and fetch from SSM
We need to read from a different file other than .env which is getting overridden by EB. For this, we use our existing EFS volume which already exists for GGS solution.
Why store on EFS?
When multiple instances boot up, the pre-deploy step executes on each of them. This is fine on normal deployment. However, consider the edge case where we want to update envs on SSM. To trigger a rebuild, we need to re-deploy from GH Actions.
However, this takes long due to our the rolling deploy. Instead, for urgent cases, we can expedite by directly running this script and doing a "Restart App Instances" on EB.
EFS enables this as you just need to run the script once instead of once on each instance. However, in the event where multiple instances are booted up at the same time, there will be concurrency issues in overwriting the .env file leading to inconsistent states. To prevent this, each instance writes to a local file and then transfers it to the EFS. A simple locking mechanism is present in the script to lock the folder we are copying into to again prevent a clash bet the 2 instances.
Note that if a param is present on script but not on SSM, current behaviour is to skip this and proceed to next param. While exiting the script and causing the deploy to fail might be a good practice of failing early, this PR's aim is to first achieve parity with our current behaviour of having convict as the checking layer (though this happens at runtime after deploy stage).
How locking works?
Here's what happens:
The file descriptor 200 is associated with the file /efs/isomer/.isomer.lock.
If /efs/isomer/.isomer.lock doesn't exist, it's created. If it already exists, it's just opened.
flock then tries to acquire a lock on the file descriptor 200, which is linked to the .isomer.lock file.
If flock can't acquire the lock (because some other process has it), then the script exits due to || exit 1.
Benefits
Have a centralised management of envs for both prod + staging
No need to manually transfer envs during node env upgrades
Possibly reduce time taken for changing envs -> all you need to do is "Restart app server"
Encrypt important/private envs
Cons
Maintaining list of envs in the script
We can't differentiate between required and optional envs in SSM. But this is less of a con as convict is our intended safety check layer for this
Breaking Changes
[X] Yes - this PR contains breaking changes
This moves our envs to SSM
[ ] No - this PR is backwards compatible
Tests
Ensure staging instances start up fine
Ensure changes in env on SSM and re-deploy works/reflects fine (see runbook)
Ensure changes in env on SSM and re-starting app instances works (see runbook)
Tests ran as part of checks for PR:
Missing env on SSM -> pre-deploy script passes -> convict still catches this -> app fails to start
Following from above, create the missing env on SSM, attempt a manual restart process (as per runbook)
Modify the param on SSM and perform re-deploy from GitHub Actions -> ensure new values are captured
Deploy Notes
Ensure ALL envs are moved to SSM with appropriate prefix.
Problem
Right now we face an issue with fetching envs from SSM:
Closes IS-412
Solution
This solution introduces a few changes:
SSM_PREFIX
which is required to determine whether it is staging or prodWhy store on EFS?
When multiple instances boot up, the pre-deploy step executes on each of them. This is fine on normal deployment. However, consider the edge case where we want to update envs on SSM. To trigger a rebuild, we need to re-deploy from GH Actions.
However, this takes long due to our the rolling deploy. Instead, for urgent cases, we can expedite by directly running this script and doing a "Restart App Instances" on EB.
EFS enables this as you just need to run the script once instead of once on each instance. However, in the event where multiple instances are booted up at the same time, there will be concurrency issues in overwriting the .env file leading to inconsistent states. To prevent this, each instance writes to a local file and then transfers it to the EFS. A simple locking mechanism is present in the script to lock the folder we are copying into to again prevent a clash bet the 2 instances.
Note that if a param is present on script but not on SSM, current behaviour is to skip this and proceed to next param. While exiting the script and causing the deploy to fail might be a good practice of failing early, this PR's aim is to first achieve parity with our current behaviour of having convict as the checking layer (though this happens at runtime after deploy stage).
How locking works?
Here's what happens:
Benefits
Cons
Breaking Changes
Tests
Tests ran as part of checks for PR:
Deploy Notes
Ensure ALL envs are moved to SSM with appropriate prefix.
New dependencies:
dotenv
: Loads env from specified file path