internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.08k stars 1.31k forks source link

Upgrade All Trusty Nodes to Ubuntu Latest #2036

Open mekarpeles opened 5 years ago

mekarpeles commented 5 years ago

Related to #703 (see aspirational #680)

View Architecture & Provisioning docs on the Wiki

Remaining Trusty Machines

Current Production Architecture

Today, our production service architecture consists of ~11 VMs: 68747470733a2f2f617263686976652e6f72672f646f776e6c6f61642f6f70656e6c6962726172792d646f63756d656e746174696f6e2f6f70656e6c6962726172792d70726f64756374696f6e2d6172636869746563747572652e706e67 (see: https://github.com/internetarchive/openlibrary/wiki/Production-Service-Architecture)

Current Provisioning Setup

Our current production setup process (as of 2019) for provisioning these 11 VMs is ostensibly manual and relies on a lot of manually scping directories around, as well as a separate repository called olsystem which contains the production configs, cron jobs, and infrastructure required to run the official openlibrary.org service.

Each of our 11 VMs are more-or-less provisioned identically:

/opt/
/opt/petabox
/opt/openlibrary
/opt/openlibrary/venv  -- python virtualenv
/opt/openlibrary/maxmind-geoip/  -- .dat file for anonymizing IPs
/opt/openlibrary/deploys  -- history of all deploys, hash-binned by service
/opt/openlibrary/deploys/openlibrary  -- history of openlibrary deploys
/opt/openlibrary/deploys/olsystem  -- history of openlibrary deploys
/opt/openlibrary/deploys/base  -- deprecated??
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active openlibrary deploy
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active olsystem deploy
/opt/openlibrary/olsystem/  -- symlink to active olsystem: /opt/deploys/openlibrary/olsystem
/opt/openlibrary/openlibrary -- symlink to active openlibrary: /opt/deploys/openlibrary/olsystem

Minimum Proposal

At minimum, re-provisioning a VM requires:

To copy /opt over from another server you'll have to: on ol-mem2: sudo tar cpSlf /var/tmp/ol.tar --same-owner -C /opt openlibrary scp /var/tmp/ol.tar ol-mem4:/var/tmp/ol.tar on ol-mem4: tar xpBsf /var/tmp/ol.tar --same-owner -C /opt (edited) (due to keys and needing to be root to get all of it i don't there's an easy way to just scp or rsync)

Ideal Proposal

An aspirational goal of this epic is to migrate Open Library VM provisioning to use a standard Ansible playbook (and possibly docker containers, a la our development environment) to support this re-provisioning.

Part of this effort includes decreasing production's dependence on the olsystem repository a la #680. Both developer and production systems should use have similar docker recipes and differ according to ansible playbooks.

Plan

The plan is to start with ol-mem0, ol-mem1, and ol-mem2 as they don't really require any infrastructure other than: 1) setup 3 new memcached servers ol-mem3, ol-mem4, ol-mem4 2) provision VMs with default ansible playbook: setup firewall rules + install docker 3) use VM-specific ansible playbook to install setup docker w/ memcached (with upstart) 4) update /opt/openlibrary/olsystem/etc/openlibrary.yml and infobase.yml configs to reference correct new memcached servers 5) /etc (e.g. memcached) to symlink to the correct system configs in /opt/openlibrary/olsystem/etc/ 6) update /opt/openlibrary/olsystem/fabfile.py supervisord to update how memcached servers should be restarted (and to not deploy to ol-mem* during deploy) 7) remove old memcached servers from the pool (one at a time)

mekarpeles commented 5 years ago

This task needs a checklist:

tfmorris commented 5 years ago

Since Ubuntu 18.04 Bionic Beaver has been out for over a year, would it make more sense to skip Xenial?

tfmorris commented 4 years ago

Our Xenial Docker-based development environment is producing dire warnings Node.js and the bundled version of pip doesn't work. I think we should upgrade our dev environment to Bionic ASAP in preparation for a production move to Bionic.

cclauss commented 3 years ago

% cat ./ubuntu_versions.sh

#!/bin/bash

# Which Ubuntu release are we running on?  Do not fail if /etc/os-release does not exist.
# cat /etc/os-release | grep VERSION= || true  # VERSION="20.04.1 LTS (Focal Fossa)"

SERVERS="ol-backup0 ol-covers0 ol-db1 ol-db2 ol-dev0 ol-dev1 ol-home ol-home0 ol-mem0 ol-mem1 ol-mem2 ol-solr0 ol-solr1 ol-web1 ol-web2 ol-www0"
parallel --quote ssh {} "hostname --short ; cat /etc/os-release | grep VERSION= ; docker --version ; docker compose version || true" ::: $SERVERS
jimman2003 commented 3 years ago

Some of the PPAs might have deleted the xenial debs/packages so the CI is failng.. So we should bump this up?

mekarpeles commented 3 years ago

https://github.com/internetarchive/openlibrary/issues/2036#issuecomment-859350115

@dhruvmanila, @cclauss, or @BharatKalluri -- is this one you may have a few minutes to quickly investigate? If it seems like it may be a pain, @cdrini and I can prioritize for next week. @cdrini is currently PTO and I'm getting my 2nd covid shot tomorrow and will likely be out of commission for at least some of the weekend :grimacing:

cclauss commented 3 years ago

It would be important to look as upgrading both:

  1. https://github.com/internetarchive/openlibrary/blob/master/docker/Dockerfile.olbase#L1
  2. https://github.com/internetarchive/openlibrary/blob/master/.github/workflows/python_tests.yml#L13
jimman2003 commented 3 years ago

also: https://github.com/internetarchive/openlibrary/blob/95f1234f81c664d186e7c5413436d4ccb5b936b8/docker/Dockerfile.olsolr#L1

cclauss commented 2 years ago

On ol-mem0...

https://internetarchive.slack.com/archives/G019YBYM35M/p1602178331011000

https://internetarchive.slack.com/archives/G019YBYM35M/p1602179875012700?thread_ts=1602178331.011000&cid=G019YBYM35M

https://github.com/internetarchive/openlibrary/wiki/Production-Service-Architecture

mekarpeles commented 2 years ago

I think we're close: ol-db[1,2], ol-backup, ol-www0->1, and presumably ol-home disappears if we can ~re~move stats-solr (?) @cclauss ?? :|

cclauss commented 1 year ago

Using the script at https://github.com/internetarchive/openlibrary/issues/7676#issuecomment-1480274108

ol-home0% ./ubuntu_versions.sh