m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Migration playbooks (post-STFC shutdown) and the "grand playbook" #1182

Open alanbchristie opened 9 months ago

alanbchristie commented 9 months ago

Related to #1148 this issue describes components that were not part of the initial migration to allow the stack to continue to operate while its original cluster was out-of-action. This issue relates to the migration of the Production stack, not the migration of the entire stack development arena. This issue relates to the migration of all the resources required by a self-contained production stack.

The components that are missing in #1148 and required to have a fully operation stack are: -

This issue does not cover the deployment of AWX, an ansible playbook server used by the CI/CD process to automate the deployment of new application containers.

If updates are required to be supported we will need: -

A Grand Playbook is something that is feasible (and could be developed). The prerequisites are: -

  1. A target cluster exists with all the "3rd party" services pre-installed. These would include compatible: -
    1. A storage class
    2. NGINX ingress controller
    3. Certificate Manager
  2. The "grand playbook" would run in two distinct phases: an "installation phase" and then a "recovery phase".
  3. The "grand playbook" would need simultaneous admin access (from one control machine) to the source and destination clusters.
  4. The "grand playbook" would "extract" all of the original playbook variables from objects present in the source cluster. This would require the inspection of numerous "well known" kubernetes objects, including those that define objects like a Secret (to obtain usernames and passwords), ConfigMap (for additional configuration information), Pod (for environment variables) and Ingress (For hostnames, paths etc.).
  5. With the data extracted it could then deploy a fresh (empty) "installation" of the source and then move to the "recovery" phase and do what was necessary to copy the relevant databases and file-system content to the destination (scaling down Pods and restarting them etc).
  6. After the "installation" and "recovery" all that would be required would be a redirection of the domains to the new cluster.

Before the "grand playbook" could safely operate we would probably need to wait until the following conditions were met: -

The "grand playbook" will not be able to:-

Replication of a live cluster will take considerable time. The graph database will take 8 to 12 hours to become live and, depending on the content of the Fragalysis media volume will take at least 30 minutes just to copy this data.

phraenquex commented 8 months ago

Also: a demonstrator database+media subset. 5 open targets would do the job. Use future ASAP targets, uploaded into v2.

alanbchristie commented 8 months ago

The relocation was generally successful and comprehensive documentation on the relocation of the production stack can be found on ReadTheDocs at: -

The relocation currently suffers from an inability to generate certificates for wild-carded domains (see #1191)