opendata / CKAN-Multisite-Plans

Simplifying the process of launching an open data repository. [RETIRED]
Creative Commons Zero v1.0 Universal
20 stars 7 forks source link

2.2 Automated deployment and initialization of new CKAN instances on a PaaS (e.g. AWS) #15

Open rossjones opened 9 years ago

rossjones commented 9 years ago
As a Cloud Admin I want to create (and manage) a CKAN instance (site) in my cloud 
farm so that it is live and available online at a URL

This user story is really a very high-level user story and there are a lot of smaller 
ones corresponding to key desired activities such as

* Create and remove a farm “environment” (e.g. DB server, VPN etc)
* Create an instance within that environment (installed and ready but not live online)
* Activate an instance (make it live online)
    a.     Setup any associated monitoring
* De-activate (take it offline but don’t destoy it)
* Purge (Destroy it plus all data - perhaps with backup)
* Plugin install, activate, deactivate, deinstall (per instance (?))

Implementation notes:
    * Support must be provided for one or more major PaaS such as AWS 
       or OpenStack
    * This process should be fully automated - so e.g. booting a new instance should 
       be one command on the command line or a click of a button
    * This functionality should be wrapped in a python library so that it can be used
       to power a web application or similar (see later user stories)
    * Relevant information arising from all these operations (e.g. details of the 
       farm, details of instances must be persisted)
       a. Details are not fully determined and left to implementor (and will intersect 
             with other later user stories e.g. re creating UIs). Suggestion is that config 
             either be simple JSON or a basic DB
    *  Bonus: nice UI for launching and monitoring (see next item)
jqnatividad commented 9 years ago

For AWS, Elastic Beanstalk supports this user story, and even supports additional features like using a managed database (RDS) and auto-scaling.

It also supports deploying from Docker Containers, which might be the way to abstract PaaS dependencies.

waldoj commented 9 years ago

I find compelling Florian Mayer’s description of his Docker-based deployment, on ckan-dev, in which he writes:

we're deploying our CKAN using Docker linux containers. In our docker image build process we copy out storage folders and the database from the (non persistent) container into persistent directories within a BTRFS snapshotting file system. That simplifies a few things for us:

  • All read-only files (software, config, dependencies) are located within the Docker image, which also contains all installed extensions,
  • All read/write files, the installed / set up / populated database, plus uploaded attachments are located within the persistent folder, making migration a "build the image and copy the persist folder" job,
  • the snapshotting file system allows us to roll back the CKAN instance to a sane state, should bad things happen, instead of having to migrate/install.

In particular, I like the notion of keeping files that should be read-only as actual read-only files. That's better for security, that simplifies caching (read-only files are not going to change), and it simplifies backup.

florianm commented 9 years ago

Hi @waldoj,

we actually moved away from Docker containers towards a dedicated AWS VM.

Our docker setup will probably be useful for a stable, only occasionally updated CKAN version. I guess we'll docker CKAN 2.4.

The AWS VM in contrast is perfect for tinkering with the latest master branches of various plugins, and we also snapshot our file system (btrfs ftw!) so we can recover from git mess-ups. Just as in the Docker setup, we separated out valuables (postgres datadir and storage dir) into a dedicated (also snapshotted) folder. I would probably not run things this way with software in charge of finances or emergency calls, but CKAN? Absolutely fine. Never more than a ssh session or git checkout or, worst case, a filesystem restore away from sanity.

Our main reason for moving to AWS was that the latest CKAN master with a few customised extensions fixes some critical bugs (resources disappearing was a big one) and gives us some custom required features. However, rolling that into a Docker image would take an order of magnitude higher effort and be outdated too quickly.

waldoj commented 9 years ago

Our docker setup will probably be useful for a stable, only occasionally updated CKAN version. I guess we'll docker CKAN 2.4. The AWS VM in contrast is perfect for tinkering with the latest master branches of various plugins, and we also snapshot our file system (btrfs ftw!) so we can recover from git mess-ups.

This was a really helpful distinction, @florianm—thank you for breaking it down like this!

wardi commented 9 years ago

ckan-multisite will be set up to run on any bare metal server or vps that allows you to run docker. If you want to mount your databases and files with a snapshotting file system you're free to do that because they're all stored in predictable locations on disk. Backups are easy: all the user data is in one place on the host filesystem (mounted as volumes by datacats) and the code is in another.

Creating and deploying instances will be from a web interface. Creating and removing a "farm environment" means installing or removing ckan-multisite, which will have simple instructions and few dependencies (docker, nginx, pip, virtualenv...).

I've created an issue for the install procedure https://github.com/boxkite/ckan-multisite/issues/5 and another for documenting the file locations on the server https://github.com/boxkite/ckan-multisite/issues/6