opendata / CKAN-Multisite-Plans

Simplifying the process of launching an open data repository. [RETIRED]
Creative Commons Zero v1.0 Universal
20 stars 7 forks source link

1.3 Multi-tenant CKAN [Very Optional] #10

Open rossjones opened 9 years ago

rossjones commented 9 years ago
As a Cloud Admin (or general Sysadmin of CKAN) I want to run multiple CKAN 
instances off the same codebase so that I can manage it more easily

_Note: this is related to but somewhat parallel to a direct approach to booting 
 multiple CKANs - the simplest approach is simple to install many sets of CKAN 
code (e.g. using Docker). However, multi-tenant offers quite a few advantages 
(e.g. maintaining one codebase, or sharing user data across instances). As such 
this user story is optional._

Implementation details:
* Likely setup is a single CKAN (code) application “instance” which serves 
  many site “instances” by switching config based on URL and therefore 
  serve different sites (with different data etc)
    a A single code-base and single active “instance” serving multiple distinct 
       customers
* Qu: do we share the Database across all the instances?
    a. Could go either way. Initial feeling is that you don’t need to share DB 
        but share code
    b. Share DB would involve prefixing all tables with a configurable prefix 
       (cf wordpress)
* Qu: how do we handle plugin activating per instance (this is not about 
   installing the plugin which is obviously across all instances)
    a. ANS: that should be in the config DB so not a problem
* Qu: do we share user database across the multi-tenant instances?
    a. That would require a shared database and would require changes to schema e.g. to
         indicate per instance what users were “live” on that instance
    b. Suggestion: do not support this.
jqnatividad commented 9 years ago

True multi-tenancy is very nice to have, but non-trivial speaking from experience as it often entails architectural changes.

Perhaps, we can just document some multi-tenant deployment patterns that members of the community are already using.

https://lists.okfn.org/pipermail/ckan-dev/2014-December/008454.html

florianm commented 9 years ago

Hi @rossjones - apologies for the late answer!

(update July 2015 - WIP: docker-based installation docs here)

At the Western Australian Department of Parks and Wildlife we're running three CKAN sites of one CKAN installation off one Ubuntu 14.04 AWS EC2 t2.medium with currently 100 GB HD:

Having a shared installation with separate configs and databases makes maintaining the CKAN installation easier, and thanks to AWS's nightly snapshotting we can restore the installation to any last 30 days, plus we can snapshot manually as required.

Compared to this easily accessed deployment, maintaining our previous Docker install was much more indirect and time-intensive - we had to rebuild the image with every change to CKAN. Once CKAN matures to a stable version we will consider going back to Docker.

The following sections illustrate our sharing/separation rather than aim to be comprehensive installation instructions. The setup is also drawn up here - apologies for the link bait :-)

File system

The VM has another 100 GB btrfs volume mounted at /mnt/ckan - differing from the default /usr/lib/ckan location to indicate that the folder is mounted, not local. It would be possible without side effects to mount the external volume as /usr/lib/ckan of course, which would make it harder to recognize as external volume, but preserve the default path for the maintainers' convenience.

Installation process

The installation is a CKAN source install as of November 2014 with our fork of the CKAN 2.3a master. The process went largely as expected, with only a few tricky problems around SolR, JTS, datapusher and open ports in firewalls. I'll detail those below while trying not to duplicate the installation docs.

Multi-tenancy did cause no problems at all. It only required a planning process for file locations / paths depending on which components were shared or separated out. The most tricky bit of the standard source install was to understand folder permissions, and keep in mind which user would access which files (the webserver www-data, the database superuser postgres, the CKAN database user ckan_default, the virtualenv owner root?), and also the grey zone to our networking setup from the VM's ports (CKAN ports 5000, 5001, 5002; Datapusher port 8800; local SolR port 8983), and the reverse proxy and firewall settings outside of my control and visibility.

Shared CKAN installation

We run the latest CKAN master (same with some extensions), as it provides some critical bug fixes the latest stable CKAN doesn't have. Also, this enables us to send the fixes back as pull requests. Ideally we'd of course run the latest stable CKAN and extensions. We get away with running master though, as sanity and order is never more than one git checkout (last working commit) away, or in the worst case, by restoring a nightly snapshot.

Contents of /mnt/ckan:

datapusher/ -- one datapusher installation, shared
default/ -- CKAN virtualenv with CKAN and extensions, shared
pgdata/ -- Postgres datadir, where all files of the actual db live
private/ -- CKAN-private's files (see below)
public/ -- CKAN-public's files
snapshots/ -- manually created file system snapshots of /mnt/ckan
test/ -- CKAN-test's files

The files directories contain each:

dbdumps/ -- pg_dump files, rsynced from previous CKAN installation, or backup from local db
public/css -- custom css
resources/ -- filestore
storage/ -- datastore
templates/ -- custom templates
tmp/ -- ckan tmp dir (cache, sessions, who.log)

Disaster recovery

Using btrfs for snapshotting, we can create a manual snapshot of the /mnt/ckan directory as follows:

# create a subvolume which is a read-only snapshot of everything in /mnt/ckan and make it accessible at /mnt/ckan/snapshots/20150116
btrfs subvolume snapshot -r /mnt/ckan /mnt/ckan/snapshots/20150116

# remove the snapshot 
btrfs subvolume delete /mnt/ckan/snapshots/20141106

Additionally, AWS provides us with a 30 day rolling nightly snapshot of the whole VM (including the manual btrfs snapshots, which might be older than 30 days).

Finally, the db can be dumped, and the db snapshots as well as contents of the filestore/datastore/template etc folders can be rsynced to a safe location off site.

Logs

Most logs live in /var/log/, all separated by site id.

Database

One Postgres 9.3 cluster runs as system service with one database for each CKAN instance, creatively called ckan_SITEID: ckan_private, ckan_public, ckan_test. In retrospect, ckan_test is double-booked by the testing suite. Don't call your CKAN database ckan_test.

Having separate dbs within the same db cluster was unproblematic and simplified (cluster-based) access management.

A few relevant code snippets for your convenience:

Moving the Postgres data folder to a custom location

(default)root@aws-ckan-001:/etc/postgresql/9.3/main# service postgresql stop
 * Stopping PostgreSQL 9.3 database server
   ...done.

# move the data folder from the default location to /mnt/ckan/pgdata
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# mkdir /mnt/ckan/pgdata
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# mv /var/lib/postgresql/9.3/main/ /mnt/ckan/pgdata/
(default)root@aws-ckan-001:/etc/postgresql/9.3/main# cd /mnt/ckan/pgdata/
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ll
total 16
drwxr-xr-x 1 root     root       8 Nov 14 15:14 ./
drwxr-xr-x 1 root     root      80 Nov 14 15:06 ../
drwx------ 1 postgres postgres 280 Nov 14 15:14 main/
# folder main/ is there with all fingers, toes and permissions! amazing.

# symlink that folder right back.
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ln -s /mnt/ckan/pgdata/main /var/lib/postgresql/9.3/main
(default)root@aws-ckan-001:/mnt/ckan/pgdata# ll /var/lib/postgresql/9.3/
total 8
drwxr-xr-x 2 postgres postgres 4096 Nov 14 15:16 ./
drwxr-xr-x 3 postgres postgres 4096 Nov  7 14:52 ../
lrwxrwxrwx 1 root     root       21 Nov 14 15:16 main -> /mnt/ckan/pgdata/main/
# permissions are ALL WRONG, let's change that
(default)root@aws-ckan-001:/mnt/ckan/pgdata# chown postgres:postgres /var/lib/postgresql/9.3/main
(default)root@aws-ckan-001:/mnt/ckan/pgdata# service postgresql start
 * Starting PostgreSQL 9.3 database server
   ...done.

Setup the databases from scratch

Shown here: ckan_private and ckan_public, not shown: ckan_test

# Are existing dbs in utf-8?
sudo -u postgres psql -l

sudo -u postgres createuser -S -D -R -P ckan_default
sudo -u postgres createuser -S -D -R -P -l datastore_default

sudo -u postgres createdb -O ckan_default ckan_private -E utf-8
sudo -u postgres createdb -O ckan_default ckan_public -E utf-8

sudo -u postgres createdb -O ckan_default datastore_private -E utf-8
sudo -u postgres createdb -O ckan_default datastore_public -E utf-8
# IMPROVEMENT use database postgis as template
# ckanext-spatial
sudo -u postgres psql -d ckan_private -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql
sudo -u postgres psql -d ckan_private -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql

sudo -u postgres psql -d ckan_public -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql
sudo -u postgres psql -d ckan_public -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql

# change owner of postgis tables to ckan_default
sudo -u postgres psql -d ckan_private -c "ALTER TABLE spatial_ref_sys OWNER TO ckan_default;ALTER TABLE geometry_columns OWNER TO ckan_default;"
sudo -u postgres psql -d ckan_public -c "ALTER TABLE spatial_ref_sys OWNER TO ckan_default;ALTER TABLE geometry_columns OWNER TO ckan_default;"

Migrating data from one CKAN instance on separate server to another

Shown: The old server is a CKAN docker container I'm attached to, with the virtualenv activated and me being root. There's one CKAN site per docker container, hence the database is called datastore_default. This requires that ssh-keygen was run on both machines, and the public ssh keys are present in the target machines ~/.ssh/authorized_keys My target machine here is my VM called aws-ckan-001.

root@d78ab2436e53:/var/lib/ckan/dbdumps#  pg_dump -c -Fc -f datastore.dump datastore_default
root@d78ab2436e53:/var/lib/ckan/dbdumps#  pg_dump -c -Fc -f ckan.dump ckan_default   
# transfer files to target host (/mnt/ckan/public must be writable to florianm)
(default)root@7ccb97afb634:/var/lib/ckan# rsync -Pavvr . florianm@aws-ckan-001:/mnt/ckan/public

Load production data from another system

use the rsynced .dump files from previous section. Dump the current db as a backup.

# Stop apache service
service apache2 stop

# Backup db
pg_dump -c -Fc -f /mnt/ckan/private/dbdumps/ckan_DATE.dump ckan_default
pg_dump -c -Fc -f /mnt/ckan/private/dbdumps/datastore_DATE.dump datastore_default

# Drop db
sudo -u postgres dropdb ckan_private
sudo -u postgres dropdb datastore_private

# Create db
sudo -u postgres createdb -O ckan_default ckan_private -E utf-8
sudo -u postgres createdb -O ckan_default datastore_private -E utf-8

# Restore db
sudo -u postgres pg_restore -d ckan_private /mnt/ckan/private/dbdumps/ckan.dump
sudo -u postgres pg_restore -d datastore_private /mnt/ckan/private/dbdumps/datastore.dump

# init, upgrade, refresh db (venv)
paster --plugin=ckan db init -c /etc/ckan/default/private.ini
paster --plugin=ckan db upgrade -c /etc/ckan/default/private.ini
paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/default/private.ini
paster --plugin=ckanext-harvest harvester initdb -c /etc/ckan/default/private.ini
paster --plugin=ckan search-index rebuild -c /etc/ckan/default/private.ini

# Restart web servers
service jetty restart
service apache2 restart

# Have a squiz at the logs
tail -f /var/log/apache2/ckan_private*.log

Load production data from one CKAN to a new CKAN

This step shows how a completely installed CKAN site (here: ckan_private) can be duplicated into another, newly set up CKAN site (here: ckan_test). Note the juggling of permissions, so the postgres user can access the dump files.

(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps# mkdir 2014-12-19
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps# cd 2014-12-19/
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# chown postgres .  
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_dump -c -Fc -f datastore.dump datastore_private
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_dump -c -Fc -f ckan.dump ckan_private

(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo service apache2 stop

(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl status                                                                  
beaver                           RUNNING    pid 17236, uptime 16 days, 23:13:02
celery-private                   RUNNING    pid 25005, uptime 14 days, 6:30:29
celery-public                    RUNNING    pid 24817, uptime 14 days, 6:31:51
celery-test                      RUNNING    pid 25024, uptime 14 days, 6:30:25
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl stop celery-test
celery-test: stopped
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl status
beaver                           RUNNING    pid 17236, uptime 16 days, 23:13:23
celery-private                   RUNNING    pid 25005, uptime 14 days, 6:30:50
celery-public                    RUNNING    pid 24817, uptime 14 days, 6:32:12
celery-test                      STOPPED    Dec 19 04:43 PM

(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres dropdb ckan_test
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres dropdb datastore_test                                                
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres createdb -O ckan_default ckan_test -E utf-8
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres createdb -O ckan_default datastore_test -E utf-8
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_restore -d ckan_test ckan.dump
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# sudo -u postgres pg_restore -d datastore_test datastore.dump
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan db init -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan db upgrade -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckanext-harvest harvester initdb -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# paster --plugin=ckan search-index rebuild -c /etc/ckan/default/test.ini
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# supervisorctl start celery-test
(default)root@aws-ckan-001:/mnt/ckan/public/dbdumps/2014-12-19# service apache2 start

SolR

The SolR setup needed tough love, strong words, and brisk walks in fresh air. In our setup, one core serves three instances, but as @rossjones said, one dedicated core might be better. I'll go into more detail here in the hope to prevent further suffering.

We use the solr-spatial-field because bounding polygons are useful for georeferencing marine datasets along our curved coastline.

Following digital ocean:

Download SolR

# Download SolR (check for newer version, shown: 4.10.2)
(default)root@aws-ckan-001:/tmp# wget http://apache.mirror.uber.com.au/lucene/solr/4.10.2/solr-4.10.2.tgz
(default)root@aws-ckan-001:/tmp# tar -xvf solr-4.10.2.tgz
(default)root@aws-ckan-001:/tmp# cp -r solr-4.10.2/example /opt/solr

# Test base install on http://aws-ckan-001:8983/solr
(default)root@aws-ckan-001:/opt/solr# java -jar start.jar

Create Jetty service that runs SolR

Create /etc/default/jetty:

NO_START=0 # Start on boot
JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr/solr $JAVA_OPTIONS"
JAVA_HOME=/usr/java/default
JETTY_HOME=/opt/solr
JETTY_USER=solr
JETTY_LOGS=/opt/solr/logs

Create /opt/solr/etc/jetty-logging.xml:

<?xml version="1.0"?>
 <!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "http://jetty.mortbay.org/configure.dtd">
 <!-- =============================================================== -->
 <!-- Configure stderr and stdout to a Jetty rollover log file -->
 <!-- this configuration file should be used in combination with -->
 <!-- other configuration files.  e.g. -->
 <!--    java -jar start.jar etc/jetty-logging.xml etc/jetty.xml -->
 <!-- =============================================================== -->
 <Configure id="Server" class="org.mortbay.jetty.Server">

     <New id="ServerLog" class="java.io.PrintStream">
       <Arg>
         <New class="org.mortbay.util.RolloverFileOutputStream">
           <Arg><SystemProperty name="jetty.logs" default="."/>/yyyy_mm_dd.stderrout.log</Arg>
           <Arg type="boolean">false</Arg>
           <Arg type="int">90</Arg>
           <Arg><Call class="java.util.TimeZone" name="getTimeZone"><Arg>GMT</Arg></Call></Arg>
           <Get id="ServerLogName" name="datedFilename"/>
         </New>
       </Arg>
     </New>

     <Call class="org.mortbay.log.Log" name="info"><Arg>Redirecting stderr/stdout to <Ref id="ServerLogName"/></Arg></Call>
     <Call class="java.lang.System" name="setErr"><Arg><Ref id="ServerLog"/></Arg></Call>
     <Call class="java.lang.System" name="setOut"><Arg><Ref id="ServerLog"/></Arg></Call></Configure>

Create Solr user and jetty service:

# Create solr user
sudo useradd -d /opt/solr -s /sbin/false solr
sudo chown solr:solr -R /opt/solr

# Create jetty service
sudo wget -O /etc/init.d/jetty http://dev.eclipse.org/svnroot/rt/org.eclipse.jetty/jetty/trunk/jetty-distribution/src/main/resources/bin/jetty.sh
sudo chmod a+x /etc/init.d/jetty
sudo update-rc.d jetty defaults

# Test jetty service
service jetty start

Customise SolR to CKAN schema

mv /opt/solr/solr/collection1 /opt/solr/solr/ckan
sed -i "s/collection1/ckan/g" /opt/solr/solr/ckan/core.properties
rm -r /opt/solr/solr/data/*

# Link schema.xml
mv /opt/solr/solr/ckan/conf/schema.xml /opt/solr/solr/ckan/conf/schema.bak
ln -s /mnt/ckan/default/src/ckan/ckan/config/solr/schema.xml /opt/solr/solr/ckan/conf/schema.xml

Download jts-topo-suite JTS-1.13, unpack and copy jars into /opt/solr/solr-webapp/webapp/WEB-INF/lib or /opt/solr/lib. Missing JTS will cause trouble - Fix

Test setup

Test JTS at http://aws-ckan-001:8983/solr/ckan/select/?fl=*,score&sort=score%20asc&q={!geofilt%20score=distance%20filter=true%20sfield=spatial_geom%20pt=42.56667,1.48333%20d=1}&fq=feature_code:PPL

CKAN ini

With the SolR core "collection1" renamed to "ckan", and the solr admin GUI at %(ckan.site_url):8983/solr, the solr_url must include the core name, and must not have a trailing slash.

solr_url = http://127.0.0.1:8983/solr/ckan

It is not necessary to open port 8993 in the firewall, as the requests to SolR never leave the local machine.

Redis

One redis serves as shared message queue for ckanext-harvesting and -archiver; separate supervisord configs run paster celery:

/etc/supervisor/conf.d/celery-SITE_ID.conf
; ===============================
; ckan celeryd supervisor example
; ===============================
[program:celery-SITE_ID]
; Full Path to executable, should be path to virtural environment,
; Full path to config file too.
command=/mnt/ckan/default/bin/paster --plugin=ckan celeryd --config=/etc/ckan/default/SITE_ID.ini
; user that owns virtual environment.
user=root
numprocs=1
stdout_logfile=/var/log/celeryd-SITE_ID.log
stderr_logfile=/var/log/celeryd-SITE_ID.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

Configs

Each CKAN instance runs off the same CKAN and plugins, but of course with separate configs:

paster --plugin=ckan make-config ckan /etc/ckan/default/{private; public; test}.ini

To facilitate maintaining one config per CKAN site, let's parameterise the site_id - this is where good folder structures pay off:


# private: port 5000; public: port 5001

[app:main]
ckan.site_id = private
home = /mnt/ckan/%(ckan.site_id)s
use = egg:ckan
full_stack = true
#cache_dir = /tmp/%(ckan.site_id)s/
cache_dir = %(home)s/tmp/
beaker.session.key = ckan

## Database Settings
sqlalchemy.url = postgresql://ckan_default:PASSWORD@localhost/ckan_%(ckan.site_id)s
ckan.datastore.write_url = postgresql://ckan_default:PASSWORD@localhost/datastore_%(ckan.site_id)s
ckan.datastore.read_url = postgresql://datastore_default:PASSWORD@localhost/datastore_%(ckan.site_id)s

ckan.site_url = http://aws-ckan-001

## Search Settings
solr_url = http://127.0.0.1:8983/solr/ckan
ckanext.spatial.search_backend = solr-spatial-field

## Plugins Settings
p_base = stats resource_proxy text_preview recline_preview pdf_preview geojson_preview
p_spt = spatial_metadata spatial_query wms_preview cswserver spatial_harvest_metadata_api
p_dst = datastore datapusher
p_hrv = harvest ckan_harvester archiver qa
p_viz = viewhelpers dashboard_preview linechart barchart basicgrid navigablemap choroplethmap
p_grp = hierarchy_display hierarchy_form
p_api = apihelper
p_rdf = htsql metadata oaipmh_harvester oaipmh
ckan.plugins = %(p_base)s %(p_spt)s %(p_dst)s
#%(p_hrv)s %(p_viz)s %(p_grp)s %(p_api)s %(p_rdf)s

# ckanext-harvest
ckan.harvest.mq.type = redis

## Storage Settings
ofs.impl = pairtree
ckan.storage_dir = %(home)s
ckan.storage_path = %(home)s
ckan.max_resource_size = 2000
ckan.max_image_size = 200
extra_template_paths = %(home)s/templates
extra_public_paths = %(home)s/public

ckan.datapusher.formats = csv xls xlsx tsv application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
ckan.datapusher.url = http://127.0.0.1:8800/

ckan.resource_proxy.max_file_size = 20 * 1024 * 1024

Maintenance

Adding a new plugin, and enabling it only in the test instance has been without negative effects on the other instances so far.

Hosting

The VM runs an Apache 2.4 (remember "Require all granted") server with

/etc/apache2/ports.conf

# add:
Listen 5000
Listen 5001
Listen 5002
Listen 8800

Separate virtualhost configs:

/etc/apache2/sites-available/ckan_SITE_ID.conf

<VirtualHost 0.0.0.0:5000>
    ServerName ckan
    WSGIScriptAlias / /etc/ckan/default/SITE_ID.wsgi
    # Pass authorization info on (needed for rest api).
    WSGIPassAuthorization On
    # Deploy as a daemon (avoids conflicts between CKAN instances).
    WSGIDaemonProcess ckan_SITE_ID display-name=ckan_SITE_ID processes=2 threads=15
    WSGIProcessGroup ckan_SITE_ID
    ErrorLog /var/log/apache2/ckan_SITE_ID.error.log
    CustomLog /var/log/apache2/ckan_SITE_ID.custom.log combined
    <IfModule mod_rpaf.c>
        RPAFenable On
        RPAFsethostname On
        RPAFproxy_ips 127.0.0.1
        RPAF_ForbidIfNotProxy Off
    </IfModule>
    <Directory "/" >
        Require all granted
    </Directory>
</VirtualHost>

Separate wsgis:

/etc/ckan/default/SITE_ID.wsgi

import os
activate_this = os.path.join('/mnt/ckan/default/bin/activate_this.py')
execfile(activate_this, dict(__file__=activate_this))
from paste.deploy import loadapp
config_filepath = os.path.join('/etc/ckan/default/SITE_ID.ini')
from paste.script.util.logging_config import fileConfig
fileConfig(config_filepath)
application = loadapp('config:%s' % config_filepath)

Outcome

The illustrated setup results in one AWS VM serving three CKAN instances on ports 5000, 5001, 5002 and one datapusher (remember to open that port in the firewall) at 8800. SolR listens only locally on localhost:8983 to /solr/ckan so that won't leave the firewall. Our networking crew reverse proxies that to make two of the CKANs accessible from our intranet only, and only one to be accessible publicly.

So far we haven't had any dramas (apart from me inadvertently chowning the entire /var/ folder which broke a lot - don't do this at home), our penetration testing suite hammers the CKANs without findings, and the Google spider pings our external instance every 2 seconds. We find it very useful to have two completely separate production instances for sensitive, unreleased vs public datasets, plus at least one testing instance.

waldoj commented 9 years ago

@florianm, this is really useful information. No doubt many future developers will be grateful to find this at the top of their Google results. :) We've just tweeted about it, which I hope will help to spread the word. I would have thought that multi-tenancy would require significant modifications to CKAN, but you did it! :)

florianm commented 9 years ago

Thanks for the mention @waldoj ! I've just updated the documentation above a bit more and added a diagram.

mattfullerton commented 9 years ago

This is fantastic

wardi commented 9 years ago

I've put up a simple diagram and description of the way ckan-multisite will share the same ckan code, config and extensions for each ckan instance created: https://github.com/boxkite/ckan-multisite/

We're not building anything to share user accounts at the moment, but that would be a really nice auth plugin for ckan. Maybe there's an existing one we could use that's based on ldap or windows domain auth.