Open rossjones opened 9 years ago
True multi-tenancy is very nice to have, but non-trivial speaking from experience as it often entails architectural changes.
Perhaps, we can just document some multi-tenant deployment patterns that members of the community are already using.
https://lists.okfn.org/pipermail/ckan-dev/2014-December/008454.html
Hi @rossjones - apologies for the late answer!
(update July 2015 - WIP: docker-based installation docs here)
At the Western Australian Department of Parks and Wildlife we're running three CKAN sites of one CKAN installation off one Ubuntu 14.04 AWS EC2 t2.medium with currently 100 GB HD:
Having a shared installation with separate configs and databases makes maintaining the CKAN installation easier, and thanks to AWS's nightly snapshotting we can restore the installation to any last 30 days, plus we can snapshot manually as required.
Compared to this easily accessed deployment, maintaining our previous Docker install was much more indirect and time-intensive - we had to rebuild the image with every change to CKAN. Once CKAN matures to a stable version we will consider going back to Docker.
The following sections illustrate our sharing/separation rather than aim to be comprehensive installation instructions. The setup is also drawn up here - apologies for the link bait :-)
The VM has another 100 GB btrfs volume mounted at /mnt/ckan
- differing from the default /usr/lib/ckan
location to indicate that the folder is mounted, not local. It would be possible without side effects to mount the external volume as /usr/lib/ckan
of course, which would make it harder to recognize as external volume, but preserve the default path for the maintainers' convenience.
The installation is a CKAN source install as of November 2014 with our fork of the CKAN 2.3a master. The process went largely as expected, with only a few tricky problems around SolR, JTS, datapusher and open ports in firewalls. I'll detail those below while trying not to duplicate the installation docs.
Multi-tenancy did cause no problems at all. It only required a planning process for file locations / paths depending on which components were shared or separated out. The most tricky bit of the standard source install was to understand folder permissions, and keep in mind which user would access which files (the webserver www-data, the database superuser postgres, the CKAN database user ckan_default, the virtualenv owner root?), and also the grey zone to our networking setup from the VM's ports (CKAN ports 5000, 5001, 5002; Datapusher port 8800; local SolR port 8983), and the reverse proxy and firewall settings outside of my control and visibility.
We run the latest CKAN master (same with some extensions), as it provides some critical bug fixes the latest stable CKAN doesn't have. Also, this enables us to send the fixes back as pull requests. Ideally we'd of course run the latest stable CKAN and extensions. We get away with running master though, as sanity and order is never more than one git checkout (last working commit) away, or in the worst case, by restoring a nightly snapshot.
Contents of /mnt/ckan
:
datapusher/ -- one datapusher installation, shared
default/ -- CKAN virtualenv with CKAN and extensions, shared
pgdata/ -- Postgres datadir, where all files of the actual db live
private/ -- CKAN-private's files (see below)
public/ -- CKAN-public's files
snapshots/ -- manually created file system snapshots of /mnt/ckan
test/ -- CKAN-test's files
The files
directories contain each:
dbdumps/ -- pg_dump files, rsynced from previous CKAN installation, or backup from local db
public/css -- custom css
resources/ -- filestore
storage/ -- datastore
templates/ -- custom templates
tmp/ -- ckan tmp dir (cache, sessions, who.log)
Using btrfs for snapshotting, we can create a manual snapshot of the /mnt/ckan
directory as follows:
# create a subvolume which is a read-only snapshot of everything in /mnt/ckan and make it accessible at /mnt/ckan/snapshots/20150116
btrfs subvolume snapshot -r /mnt/ckan /mnt/ckan/snapshots/20150116
# remove the snapshot
btrfs subvolume delete /mnt/ckan/snapshots/20141106
Additionally, AWS provides us with a 30 day rolling nightly snapshot of the whole VM (including the manual btrfs snapshots, which might be older than 30 days).
Finally, the db can be dumped, and the db snapshots as well as contents of the filestore/datastore/template etc folders can be rsynced to a safe location off site.
Most logs live in /var/log/
, all separated by site id.
One Postgres 9.3 cluster runs as system service with one database for each CKAN instance, creatively called ckan_SITEID
: ckan_private
, ckan_public
, ckan_test
. In retrospect, ckan_test
is double-booked by the testing suite. Don't call your CKAN database ckan_test
.
Having separate dbs within the same db cluster was unproblematic and simplified (cluster-based) access management.
A few relevant code snippets for your convenience:
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# service postgresql stop
* Stopping PostgreSQL 9.3 database server
...done.
# move the data folder from the default location to /mnt/ckan/pgdata
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# mkdir /mnt/ckan/pgdata
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# mv /var/lib/postgresql/9.3/main/ /mnt/ckan/pgdata/
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# cd /mnt/ckan/pgdata/
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ll
total 16
drwxr-xr-x 1 root root 8 Nov 14 15:14 ./
drwxr-xr-x 1 root root 80 Nov 14 15:06 ../
drwx------ 1 postgres postgres 280 Nov 14 15:14 main/
# folder main/ is there with all fingers, toes and permissions! amazing.
# symlink that folder right back.
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ln -s /mnt/ckan/pgdata/main /var/lib/postgresql/9.3/main
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ll /var/lib/postgresql/9.3/
total 8
drwxr-xr-x 2 postgres postgres 4096 Nov 14 15:16 ./
drwxr-xr-x 3 postgres postgres 4096 Nov 7 14:52 ../
lrwxrwxrwx 1 root root 21 Nov 14 15:16 main -> /mnt/ckan/pgdata/main/
# permissions are ALL WRONG, let's change that
(default)root@aws-ckan-001:/mnt/ckan/pgdata# chown postgres:postgres /var/lib/postgresql/9.3/main
(default)root@aws-ckan-001:/mnt/ckan/pgdata# service postgresql start
* Starting PostgreSQL 9.3 database server
...done.
Shown here: ckan_private and ckan_public, not shown: ckan_test
# Are existing dbs in utf-8?
sudo -u postgres psql -l
sudo -u postgres createuser -S -D -R -P ckan_default
sudo -u postgres createuser -S -D -R -P -l datastore_default
sudo -u postgres createdb -O ckan_default ckan_private -E utf-8
sudo -u postgres createdb -O ckan_default ckan_public -E utf-8
sudo -u postgres createdb -O ckan_default datastore_private -E utf-8
sudo -u postgres createdb -O ckan_default datastore_public -E utf-8
# IMPROVEMENT use database postgis as template
# ckanext-spatial
sudo -u postgres psql -d ckan_private -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql
sudo -u postgres psql -d ckan_private -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql
sudo -u postgres psql -d ckan_public -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql
sudo -u postgres psql -d ckan_public -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql
# change owner of postgis tables to ckan_default
sudo -u postgres psql -d ckan_private -c "ALTER TABLE spatial_ref_sys OWNER TO ckan_default;ALTER TABLE geometry_columns OWNER TO ckan_default;"
sudo -u postgres psql -d ckan_public -c "ALTER TABLE spatial_ref_sys OWNER TO ckan_default;ALTER TABLE geometry_columns OWNER TO ckan_default;"
Shown: The old server is a CKAN docker container I'm attached to, with the virtualenv activated and me being root. There's one CKAN site per docker container, hence the database is called datastore_default
.
This requires that ssh-keygen was run on both machines, and the public ssh keys are present in the target machines ~/.ssh/authorized_keys
My target machine here is my VM called aws-ckan-001
.
root@d78ab2436e53:/var/lib/ckan/dbdumps# pg_dump -c -Fc -f datastore.dump datastore_default
root@d78ab2436e53:/var/lib/ckan/dbdumps# pg_dump -c -Fc -f ckan.dump ckan_default
# transfer files to target host (/mnt/ckan/public must be writable to florianm)
(default)root@7ccb97afb634:/var/lib/ckan# rsync -Pavvr . florianm@aws-ckan-001:/mnt/ckan/public
use the rsynced .dump files from previous section. Dump the current db as a backup.
# Stop apache service
service apache2 stop
# Backup db
pg_dump -c -Fc -f /mnt/ckan/private/dbdumps/ckan_DATE.dump ckan_default
pg_dump -c -Fc -f /mnt/ckan/private/dbdumps/datastore_DATE.dump datastore_default
# Drop db
sudo -u postgres dropdb ckan_private
sudo -u postgres dropdb datastore_private
# Create db
sudo -u postgres createdb -O ckan_default ckan_private -E utf-8
sudo -u postgres createdb -O ckan_default datastore_private -E utf-8
# Restore db
sudo -u postgres pg_restore -d ckan_private /mnt/ckan/private/dbdumps/ckan.dump
sudo -u postgres pg_restore -d datastore_private /mnt/ckan/private/dbdumps/datastore.dump
# init, upgrade, refresh db (venv)
paster --plugin=ckan db init -c /etc/ckan/default/private.ini
paster --plugin=ckan db upgrade -c /etc/ckan/default/private.ini
paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/default/private.ini
paster --plugin=ckanext-harvest harvester initdb -c /etc/ckan/default/private.ini
paster --plugin=ckan search-index rebuild -c /etc/ckan/default/private.ini
# Restart web servers
service jetty restart
service apache2 restart
# Have a squiz at the logs
tail -f /var/log/apache2/ckan_private*.log
This step shows how a completely installed CKAN site (here: ckan_private
) can be duplicated into another, newly set up CKAN site (here: ckan_test
).
Note the juggling of permissions, so the postgres user can access the dump files.
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps# mkdir 2014-12-19
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps# cd 2014-12-19/
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# chown postgres .
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_dump -c -Fc -f datastore.dump datastore_private
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_dump -c -Fc -f ckan.dump ckan_private
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo service apache2 stop
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl status
beaver RUNNING pid 17236, uptime 16 days, 23:13:02
celery-private RUNNING pid 25005, uptime 14 days, 6:30:29
celery-public RUNNING pid 24817, uptime 14 days, 6:31:51
celery-test RUNNING pid 25024, uptime 14 days, 6:30:25
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl stop celery-test
celery-test: stopped
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl status
beaver RUNNING pid 17236, uptime 16 days, 23:13:23
celery-private RUNNING pid 25005, uptime 14 days, 6:30:50
celery-public RUNNING pid 24817, uptime 14 days, 6:32:12
celery-test STOPPED Dec 19 04:43 PM
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres dropdb ckan_test
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres dropdb datastore_test
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres createdb -O ckan_default ckan_test -E utf-8
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres createdb -O ckan_default datastore_test -E utf-8
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_restore -d ckan_test ckan.dump
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_restore -d datastore_test datastore.dump
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan db init -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan db upgrade -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckanext-harvest harvester initdb -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan search-index rebuild -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl start celery-test
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# service apache2 start
The SolR setup needed tough love, strong words, and brisk walks in fresh air. In our setup, one core serves three instances, but as @rossjones said, one dedicated core might be better. I'll go into more detail here in the hope to prevent further suffering.
We use the solr-spatial-field because bounding polygons are useful for georeferencing marine datasets along our curved coastline.
Following digital ocean:
solr_url = http://127.0.0.1:8983/solr/ckan
# Download SolR (check for newer version, shown: 4.10.2)
(default)root@aws-ckan-001:/tmp# wget http://apache.mirror.uber.com.au/lucene/solr/4.10.2/solr-4.10.2.tgz
(default)root@aws-ckan-001:/tmp# tar -xvf solr-4.10.2.tgz
(default)root@aws-ckan-001:/tmp# cp -r solr-4.10.2/example /opt/solr
# Test base install on http://aws-ckan-001:8983/solr
(default)root@aws-ckan-001:/opt/solr# java -jar start.jar
Create /etc/default/jetty:
NO_START=0 # Start on boot
JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr/solr $JAVA_OPTIONS"
JAVA_HOME=/usr/java/default
JETTY_HOME=/opt/solr
JETTY_USER=solr
JETTY_LOGS=/opt/solr/logs
Create /opt/solr/etc/jetty-logging.xml:
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "http://jetty.mortbay.org/configure.dtd">
<!-- =============================================================== -->
<!-- Configure stderr and stdout to a Jetty rollover log file -->
<!-- this configuration file should be used in combination with -->
<!-- other configuration files. e.g. -->
<!-- java -jar start.jar etc/jetty-logging.xml etc/jetty.xml -->
<!-- =============================================================== -->
<Configure id="Server" class="org.mortbay.jetty.Server">
<New id="ServerLog" class="java.io.PrintStream">
<Arg>
<New class="org.mortbay.util.RolloverFileOutputStream">
<Arg><SystemProperty name="jetty.logs" default="."/>/yyyy_mm_dd.stderrout.log</Arg>
<Arg type="boolean">false</Arg>
<Arg type="int">90</Arg>
<Arg><Call class="java.util.TimeZone" name="getTimeZone"><Arg>GMT</Arg></Call></Arg>
<Get id="ServerLogName" name="datedFilename"/>
</New>
</Arg>
</New>
<Call class="org.mortbay.log.Log" name="info"><Arg>Redirecting stderr/stdout to <Ref id="ServerLogName"/></Arg></Call>
<Call class="java.lang.System" name="setErr"><Arg><Ref id="ServerLog"/></Arg></Call>
<Call class="java.lang.System" name="setOut"><Arg><Ref id="ServerLog"/></Arg></Call></Configure>
Create Solr user and jetty service:
# Create solr user
sudo useradd -d /opt/solr -s /sbin/false solr
sudo chown solr:solr -R /opt/solr
# Create jetty service
sudo wget -O /etc/init.d/jetty http://dev.eclipse.org/svnroot/rt/org.eclipse.jetty/jetty/trunk/jetty-distribution/src/main/resources/bin/jetty.sh
sudo chmod a+x /etc/init.d/jetty
sudo update-rc.d jetty defaults
# Test jetty service
service jetty start
mv /opt/solr/solr/collection1 /opt/solr/solr/ckan
sed -i "s/collection1/ckan/g" /opt/solr/solr/ckan/core.properties
rm -r /opt/solr/solr/data/*
# Link schema.xml
mv /opt/solr/solr/ckan/conf/schema.xml /opt/solr/solr/ckan/conf/schema.bak
ln -s /mnt/ckan/default/src/ckan/ckan/config/solr/schema.xml /opt/solr/solr/ckan/conf/schema.xml
Download jts-topo-suite JTS-1.13, unpack and copy jars into /opt/solr/solr-webapp/webapp/WEB-INF/lib
or /opt/solr/lib
.
Missing JTS will cause trouble - Fix
Test JTS at http://aws-ckan-001:8983/solr/ckan/select/?fl=*,score&sort=score%20asc&q={!geofilt%20score=distance%20filter=true%20sfield=spatial_geom%20pt=42.56667,1.48333%20d=1}&fq=feature_code:PPL
With the SolR core "collection1" renamed to "ckan", and the solr admin GUI at %(ckan.site_url):8983/solr
, the solr_url
must include the core name, and must not have a trailing slash.
solr_url = http://127.0.0.1:8983/solr/ckan
It is not necessary to open port 8993 in the firewall, as the requests to SolR never leave the local machine.
One redis serves as shared message queue for ckanext-harvesting and -archiver; separate supervisord configs run paster celery
:
/etc/supervisor/conf.d/celery-SITE_ID.conf
; ===============================
; ckan celeryd supervisor example
; ===============================
[program:celery-SITE_ID]
; Full Path to executable, should be path to virtural environment,
; Full path to config file too.
command=/mnt/ckan/default/bin/paster --plugin=ckan celeryd --config=/etc/ckan/default/SITE_ID.ini
; user that owns virtual environment.
user=root
numprocs=1
stdout_logfile=/var/log/celeryd-SITE_ID.log
stderr_logfile=/var/log/celeryd-SITE_ID.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Each CKAN instance runs off the same CKAN and plugins, but of course with separate configs:
paster --plugin=ckan make-config ckan /etc/ckan/default/{private; public; test}.ini
To facilitate maintaining one config per CKAN site, let's parameterise the site_id
- this is where good folder structures pay off:
# private: port 5000; public: port 5001
[app:main]
ckan.site_id = private
home = /mnt/ckan/%(ckan.site_id)s
use = egg:ckan
full_stack = true
#cache_dir = /tmp/%(ckan.site_id)s/
cache_dir = %(home)s/tmp/
beaker.session.key = ckan
## Database Settings
sqlalchemy.url = postgresql://ckan_default:PASSWORD@localhost/ckan_%(ckan.site_id)s
ckan.datastore.write_url = postgresql://ckan_default:PASSWORD@localhost/datastore_%(ckan.site_id)s
ckan.datastore.read_url = postgresql://datastore_default:PASSWORD@localhost/datastore_%(ckan.site_id)s
ckan.site_url = http://aws-ckan-001
## Search Settings
solr_url = http://127.0.0.1:8983/solr/ckan
ckanext.spatial.search_backend = solr-spatial-field
## Plugins Settings
p_base = stats resource_proxy text_preview recline_preview pdf_preview geojson_preview
p_spt = spatial_metadata spatial_query wms_preview cswserver spatial_harvest_metadata_api
p_dst = datastore datapusher
p_hrv = harvest ckan_harvester archiver qa
p_viz = viewhelpers dashboard_preview linechart barchart basicgrid navigablemap choroplethmap
p_grp = hierarchy_display hierarchy_form
p_api = apihelper
p_rdf = htsql metadata oaipmh_harvester oaipmh
ckan.plugins = %(p_base)s %(p_spt)s %(p_dst)s
#%(p_hrv)s %(p_viz)s %(p_grp)s %(p_api)s %(p_rdf)s
# ckanext-harvest
ckan.harvest.mq.type = redis
## Storage Settings
ofs.impl = pairtree
ckan.storage_dir = %(home)s
ckan.storage_path = %(home)s
ckan.max_resource_size = 2000
ckan.max_image_size = 200
extra_template_paths = %(home)s/templates
extra_public_paths = %(home)s/public
ckan.datapusher.formats = csv xls xlsx tsv application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
ckan.datapusher.url = http://127.0.0.1:8800/
ckan.resource_proxy.max_file_size = 20 * 1024 * 1024
site_url
is the same across instances. site_id
so we can copy-paste. Adding a new plugin, and enabling it only in the test instance has been without negative effects on the other instances so far.
The VM runs an Apache 2.4 (remember "Require all granted") server with
/etc/apache2/ports.conf
# add:
Listen 5000
Listen 5001
Listen 5002
Listen 8800
Separate virtualhost configs:
/etc/apache2/sites-available/ckan_SITE_ID.conf
<VirtualHost 0.0.0.0:5000>
ServerName ckan
WSGIScriptAlias / /etc/ckan/default/SITE_ID.wsgi
# Pass authorization info on (needed for rest api).
WSGIPassAuthorization On
# Deploy as a daemon (avoids conflicts between CKAN instances).
WSGIDaemonProcess ckan_SITE_ID display-name=ckan_SITE_ID processes=2 threads=15
WSGIProcessGroup ckan_SITE_ID
ErrorLog /var/log/apache2/ckan_SITE_ID.error.log
CustomLog /var/log/apache2/ckan_SITE_ID.custom.log combined
<IfModule mod_rpaf.c>
RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1
RPAF_ForbidIfNotProxy Off
</IfModule>
<Directory "/" >
Require all granted
</Directory>
</VirtualHost>
Separate wsgis:
/etc/ckan/default/SITE_ID.wsgi
import os
activate_this = os.path.join('/mnt/ckan/default/bin/activate_this.py')
execfile(activate_this, dict(__file__=activate_this))
from paste.deploy import loadapp
config_filepath = os.path.join('/etc/ckan/default/SITE_ID.ini')
from paste.script.util.logging_config import fileConfig
fileConfig(config_filepath)
application = loadapp('config:%s' % config_filepath)
The illustrated setup results in one AWS VM serving three CKAN instances on ports 5000, 5001, 5002 and one datapusher (remember to open that port in the firewall) at 8800. SolR listens only locally on localhost:8983 to /solr/ckan so that won't leave the firewall. Our networking crew reverse proxies that to make two of the CKANs accessible from our intranet only, and only one to be accessible publicly.
So far we haven't had any dramas (apart from me inadvertently chowning the entire /var/ folder which broke a lot - don't do this at home), our penetration testing suite hammers the CKANs without findings, and the Google spider pings our external instance every 2 seconds. We find it very useful to have two completely separate production instances for sensitive, unreleased vs public datasets, plus at least one testing instance.
@florianm, this is really useful information. No doubt many future developers will be grateful to find this at the top of their Google results. :) We've just tweeted about it, which I hope will help to spread the word. I would have thought that multi-tenancy would require significant modifications to CKAN, but you did it! :)
Thanks for the mention @waldoj ! I've just updated the documentation above a bit more and added a diagram.
This is fantastic
I've put up a simple diagram and description of the way ckan-multisite will share the same ckan code, config and extensions for each ckan instance created: https://github.com/boxkite/ckan-multisite/
We're not building anything to share user accounts at the moment, but that would be a really nice auth plugin for ckan. Maybe there's an existing one we could use that's based on ldap or windows domain auth.