Note that here you should create a new user called ubuntu (I used my own user and had to modify various scripts and config files which is described below)
I needed to change the DocumentRoot to match the actual location where the data was installed. In my case the sources directory was /home/pjm/sources instead of /home/ubuntu/sources.
Ideally there should have been a new user called ubuntu but I didnt know about this until I was too far into the process.
cd ~/sources
wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.11.tar.gz
tar -xvzf libiconv-1.11.tar.gz
cd libiconv-1.11
./configure --prefix=/usr/local/libiconv
make
sudo make install
sudo ln -s /usr/local/libiconv/lib/libiconv.so.2 /usr/lib/libiconv.so.2
cd ~/sources
git clone git://github.com/petewarden/osm2pgsql
cd osm2pgsql/
./autogen.sh
sed -i 's/version = BZ2_bzlibVersion();//' configure
sed -i 's/version = zlibVersion();//' configure
./configure
make
sudo make install
cd ..
I started the next set of commands in a new window...
cd ~/sources
git clone git://github.com/petewarden/boilerpipe
cd boilerpipe/boilerpipe-core/
ant
cd src
javac -cp ../dist/boilerpipe-1.1-dev.jar boilerpipe.java
cd ~/sources/dstk/
psql -U postgres -d reversegeo -f sql/loadukpostcodes.sql
We have an internal TwoFishes server running on port 8081, so redirect
# requests that look like they belong to its API
ProxyPass /twofishes http://localhost:8081
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
Header set Access-Control-Allow-Origin "*"
Header set Cache-Control "max-age=86400"
</Directory>
This is my version of ec2setup.txt that I modified to work on my own home grown Ubuntu 12.04 LTS instance.
Start with AMI # ami-3fec7956 (Ubuntu 12.04), 32GB (ec2-run-instances ami-3fec7956 -t m1.large --region us-east-1 -z us-east-1d --block-device-mapping /dev/sda1=:32:false -k)
sudo apt-add-repository -y ppa:olivier-berten/geo sudo add-apt-repository -y ppa:webupd8team/java sudo aptitude update sudo aptitude safe-upgrade -y sudo aptitude full-upgrade -y sudo aptitude install -y build-essential apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert postgresql libpq-dev ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby sqlite3 libsqlite3-ruby1.8 git-core libcurl4-openssl-dev apache2-prefork-dev libapr1-dev libaprutil1-dev subversion postgresql-9.1-postgis autoconf libtool libxml2-dev libbz2-1.0 libbz2-dev libgeos-dev proj-bin libproj-dev ocropus pdftohtml catdoc unzip ant openjdk-6-jdk lftp php5-cli rubygems flex postgresql-server-dev-9.1 proj libjson0-dev xsltproc docbook-xsl docbook-mathml gettext postgresql-contrib-9.1 pgadmin3 python-software-properties bison dos2unix sudo aptitude install -y oracle-java7-installer sudo aptitude install -y libgdal-dev sudo aptitude install -y libgeos++-dev sudo bash -c 'echo "/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server" > /etc/ld.so.conf.d/jvm.conf' sudo ldconfig
Note that here you should create a new user called ubuntu (I used my own user and had to modify various scripts and config files which is described below)
mkdir ~/sources cd ~/sources wget http://download.osgeo.org/postgis/source/postgis-2.0.3.tar.gz tar xfvz postgis-2.0.3.tar.gz cd postgis-2.0.3 ./configure --with-gui
./configure --with-gui --without-topology
If the GEO version is incorrect then perform the following steps:
wget http://download.osgeo.org/geos/geos-3.3.8.tar.bz2 tar xjf geos-3.3.8.tar.bz2 cd geos-3.3.8 ./configure make sudo make install cd ~/sources/postgis-2.0.3 ./configure --with-gui
Note that the above steps didnt work. It appears that there should be a way to setup the load libraries correctly but I gave up.
otherwise continue here:
make sudo make install sudo ldconfig sudo make comments-install
sudo sed -i "s/ident/trust/" /etc/postgresql/9.1/main/pg_hba.conf sudo sed -i "s/md5/trust/" /etc/postgresql/9.1/main/pg_hba.conf sudo sed -i "s/peer/trust/" /etc/postgresql/9.1/main/pg_hba.conf sudo /etc/init.d/postgresql restart createdb -U postgres geodict
sudo -u postgres createdb template_postgis sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/spatial_ref_sys.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis_comments.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/rtpostgis.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/raster_comments.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology_comments.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy.sql sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy_gist.sql
cd ~/sources git clone git://github.com/petewarden/dstk.git git clone git://github.com/petewarden/dstkdata.git cd dstk sudo gem install bundler sudo bundle install
cd ~/sources/dstkdata
If you want to save disk space and don't need geo-statistics, you can skip everything
up until the comment indicating the end of the geostats loading.
I SKIPPED TO %%%%%%%%% BELOW
createdb -U postgres -T template_postgis statistics
tar xzf statistics/gl_gpwfe_pdens_15_bil_25.tar.gz export PATH=$PATH:/usr/lib/postgresql/9.1/bin/ /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I gl_gpwfe_pdens_15_bil_25/glds15ag.bil public.population_density | psql -U postgres -d statistics rm -rf gl_gpwfe_pdens_15_bil_25 unzip statistics/glc2000_v1_1_Tiff.zip /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I Tiff/glc2000_v1_1.tif public.land_cover | psql -U postgres -d statistics rm -rf Tiff
sudo mkdir /mnt/data sudo chown pjm /mnt/data cd /mnt/data
The zip files are here: http://gis-lab.info/data/srtm-tif/, or here http://srtm.csi.cgiar.org/ or here https://hc.app.box.com/shared/1yidaheouv password = ThanksCSI!
sudo curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_NE_250m.tif.zip"
unzip SRTM_NE_250m.tif.zip
I got the TIF files from here instead!
sudo curl -O "https://hc.box.net/shared/1yidaheouv/SRTM_SE_250m_TIF.rar" unrar SRTM_NE_250m_TIF.rar /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 SRTM_NE_250m.tif public.elevation | psql -U postgres -d statistics rm -rf SRTM_NE_250m curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_W_250m.tif.zip" unzip SRTM_W_250m.tif.zip /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a SRTM_W_250m.tif public.elevation | psql -U postgres -d statistics rm -rf unzip SRTM_W_250m curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_SE_250m.tif.zip" unzip SRTM_SE_250m.tif.zip /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a -I SRTM_SE_250m.tif public.elevation | psql -U postgres -d statistics rm -rf SRTM_SE_250m*
curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/tmean_30s_bil.zip" unzip tmean_30s_bil.zip /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_1.bil public.mean_temperature_01 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_2.bil public.mean_temperature_02 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_3.bil public.mean_temperature_03 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_4.bil public.mean_temperature_04 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_5.bil public.mean_temperature_05 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_6.bil public.mean_temperature_06 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_7.bil public.mean_temperature_07 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_8.bil public.mean_temperature_08 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_9.bil public.mean_temperature_09 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_10.bil public.mean_temperature_10 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_11.bil public.mean_temperature_11 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_12.bil public.mean_temperature12 | psql -U postgres -d statistics rm -rf tmean*
curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/prec_30s_bil.zip" unzip prec_30s_bil.zip /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_1.bil public.precipitation_01 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_2.bil public.precipitation_02 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_3.bil public.precipitation_03 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_4.bil public.precipitation_04 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_5.bil public.precipitation_05 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_6.bil public.precipitation_06 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_7.bil public.precipitation_07 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_8.bil public.precipitation_08 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_9.bil public.precipitation_09 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_10.bil public.precipitation_10 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_11.bil public.precipitation_11 | psql -U postgres -d statistics /usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_12.bil public.precipitation12 | psql -U postgres -d statistics rm -rf prec*
unzip /home/pjm/sources/dstkdata/statistics/us_statisticsrasters.zip -d . for f in .tif; do raster2pgsql -s 4236 -t 32x32 -I $f
basename $f .tif
| psql -U postgres -d statistics; done rm -rf us_ rm -rf metadataThis is the end of the geostats loading, continue from here if you decide to skip that part.
%%%%%%%% START HERE AGAIN
sudo gem install passenger sudo passenger-install-apache2-module
You'll need to update the version number below to match whichever actual passenger version was installed
This is what the build said:
LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/mod_passenger.so
PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18
PassengerDefaultRuby /usr/bin/ruby1.8
I changed the passenger version in the lines below to match what was found from the lines above:
sudo bash -c 'echo "LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/modpassenger.so" > /etc/apache2/mods-enabled/passenger.load' sudo bash -c 'echo "PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18" > /etc/apache2/mods-enabled/passenger.conf' sudo bash -c 'echo "PassengerRuby /usr/bin/ruby1.8" >> /etc/apache2/mods-enabled/passenger.conf' sudo bash -c 'echo "PassengerMaxPoolSize 3" >> /etc/apache2/mods-enabled/passenger.conf' sudo sed -i "s/MaxRequestsPerChild[ \t][ \t][0-9][0-9]_/MaxRequestsPerChild 20/" /etc/apache2/apache2.conf
I needed to change the DocumentRoot to match the actual location where the data was installed. In my case the sources directory was /home/pjm/sources instead of /home/ubuntu/sources.
Ideally there should have been a new user called ubuntu but I didnt know about this until I was too far into the process.
sudo bash -c 'echo " <VirtualHost *:8000> ServerName 127.0.1.1 DocumentRoot /home/pjm/sources/dstk/public RewriteEngine On RewriteCond %{HTTPHOST} ^datasciencetoolkit.org$ [NC] RewriteRule ^(.)$ http://www.datasciencetoolkit.org$1 [R=301,L] RewriteCond %{HTTPHOST} ^datasciencetoolkit.com$ [NC] RewriteRule ^(.)$ http://www.datasciencetoolkit.com$1 [R=301,L] <Directory /home/pjm/sources/dstk/public> AllowOverride all Options -MultiViews " > /etc/apache2/sites-enabled/000-default' sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load
sudo /etc/init.d/apache2 restart
sudo gem install postgres -v '0.7.9.2008.01.28'
cd ~/sources/dstk ./populate_database.rb
cd ~/sources mkdir maxmind cd maxmind wget "http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz" gunzip GeoLiteCity.dat.gz wget "http://geolite.maxmind.com/download/geoip/api/c/GeoIP.tar.gz" tar xzvf GeoIP.tar.gz cd GeoIP-1.4.8/ libtoolize -f ./configure make sudo make install cd .. svn checkout svn://rubyforge.org/var/svn/net-geoip/trunk net-geoip cd net-geoip/ ruby ext/extconf.rb make sudo make install
cd ~/sources wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.11.tar.gz tar -xvzf libiconv-1.11.tar.gz cd libiconv-1.11 ./configure --prefix=/usr/local/libiconv make sudo make install sudo ln -s /usr/local/libiconv/lib/libiconv.so.2 /usr/lib/libiconv.so.2
createdb -U postgres -T template_postgis reversegeo
cd ~/sources git clone git://github.com/petewarden/osm2pgsql cd osm2pgsql/ ./autogen.sh sed -i 's/version = BZ2_bzlibVersion();//' configure sed -i 's/version = zlibVersion();//' configure ./configure make sudo make install cd ..
osm2pgsql -U postgres -d reversegeo -p world_countries -S osm2pgsql/styles/world_countries.style dstkdata/world_countries.osm -l osm2pgsql -U postgres -d reversegeo -p admin_areas -S osm2pgsql/styles/admin_areas.style dstkdata/admin_areas.osm -l osm2pgsql -U postgres -d reversegeo -p neighborhoods -S osm2pgsql/styles/neighborhoods.style dstkdata/neighborhoods.osm -l
The above commands take several hours to complete
I started the next set of commands in a new window...
cd ~/sources git clone git://github.com/petewarden/boilerpipe cd boilerpipe/boilerpipe-core/ ant cd src javac -cp ../dist/boilerpipe-1.1-dev.jar boilerpipe.java
cd ~/sources/dstk/ psql -U postgres -d reversegeo -f sql/loadukpostcodes.sql
osm2pgsql -U postgres -d reversegeo -p uk_osm -S ../osm2pgsql/default.style ../dstkdata/uk_osm.osm.bz2 -l
psql -U postgres -d reversegeo -f sql/buildukindexes.sql
cd ~/sources git clone git://github.com/geocommons/geocoder.git cd geocoder make sudo make install
Build the latest Tiger/Line data for US address lookups
cd /mnt/data mkdir tigerdata cd tigerdata lftp ftp2.census.gov:/geo/tiger/TIGER2012/EDGES mirror --parallel=5 . cd ../FEATNAMES mirror --parallel=5 . cd ../ADDR mirror --parallel=5 . exit cd ~/sources/geocoder/build/ mkdir ../../geocoderdata/ ./tiger_import ../../geocoderdata/geocoder2012.db /mnt/data/tigerdata/
Completed to here
cd ~/sources git clone git://github.com/luislavena/sqlite3-ruby.git cd sqlite3-ruby ruby setup.rb config ruby setup.rb setup sudo ruby setup.rb install
cd ~/sources/geocoder bin/rebuild_metaphones ../geocoderdata/geocoder2012.db chmod +x build/build_indexes build/build_indexes ../geocoderdata/geocoder2012.db rm -rf /mnt/data/tigerdata
createdb -U postgres names cd /mnt/data curl -O "http://www.ssa.gov/oact/babynames/names.zip" dos2unix yob*.txt ~/sources/dstk/dataconversion/analyzebabynames.rb . > babynames.csv psql -U postgres -d names -f ~/sources/dstk/sql/loadnames.sql
Fix for postgres crashes,
sudo sed -i "s/shared_buffers = [0-9A-Za-z]*/shared_buffers = 512MB/" /etc/postgresql/9.1/main/postgresql.conf sudo sysctl -w kernel.shmmax=576798720 sudo bash -c 'echo "kernel.shmmax=576798720" >> /etc/sysctl.conf' sudo bash -c 'echo "vm.overcommit_memory=2" >> /etc/sysctl.conf' sudo sed -i "s/max_connections = 100/max_connections = 200/" /etc/postgresql/9.1/main/postgresql.conf sudo /etc/init.d/postgresql restart
Remove files not needed at runtime
rm -rf /mnt/data/* rm -rf ~/sources/libiconv-1.11.tar.gz rm -rf ~/sources/postgis-2.0.3.tar.gz cd ~/sources/ mkdir dstkdata_runtime mv dstkdata/ethnicityofsurnames.csv dstkdata_runtime/ mv dstkdata/GeoLiteCity.dat dstkdata_runtime/ rm -rf dstkdata mv dstkdata_runtime dstkdata
Up to this point, you'll have a 0.50 version of the toolkit.
The following will upgrade you to a 0.51 version
cd ~/sources/dstk git pull origin master
I found that the toolkit wass already uptodate
TwoFishes geocoder
cd ~/sources mkdir twofishes cd twofishes mkdir bin curl "http://www.twofishes.net/binaries/latest.jar" > bin/twofishes.jar mkdir data
The source link above is obsolete
curl "http://www.twofishes.net/indexes/revgeo/latest.zip" > data/twofishesdata.zip
This one might work... its unknown what was in latest.zip versus 2015-03-05.zip
curl "http://www.twofishes.net/indexes/revgeo/2015-03-05.zip" > data/twofishesdata.zip
The ~/sources/dstk/twofishd.sh must be edited to point to the new directory.
change
java -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/
to this
java -Xmx1500M -jar /home/pjm/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/pjm/sources/twofishes/data/2015-03-05-20-05-30.753698/
The entire ~/sources/dstk/ directory should be check to see if there is any reference to /home/ubuntu and renamed to point to /home/pjm instead
I looked through the dstk and found several instances like this:
cd ~/sources/dstk
grep '/home/ubuntu' *
geodict_daemon.rb:Daemons.run('/home/ubuntu/sources/dstk/dstk_server.rb', {
twofishes.conf:exec start-stop-daemon --start -c root --exec /home/ubuntu/sources/dstk/twofishesd.sh
twofishesd.sh:java -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/
cd data unzip twofishesdata.zip
sudo cp ~/sources/dstk/twofishes.conf /etc/init/twofishes.conf sudo service twofishes start
Here is what the VirtualHost field looks like already
sudo bash -c 'echo "
<VirtualHost *:8000>
ServerName 127.0.1.1
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.org$ [NC]
RewriteRule ^(.*)$ http://www.datasciencetoolkit.org$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.com$ [NC]
RewriteRule ^(.*)$ http://www.datasciencetoolkit.com$1 [R=301,L]
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
" > /etc/apache2/sites-enabled/000-default'
This will be changed to this now:
sudo bash -c 'echo " <VirtualHost *:8000> ServerName 127.0.1.1 DocumentRoot /home/pjm/sources/dstk/public RewriteEngine On RewriteCond %{HTTPHOST} ^datasciencetoolkit.org$ [NC] RewriteRule ^(.)$ http://www.datasciencetoolkit.org$1 [R=301,L] RewriteCond %{HTTPHOST} ^datasciencetoolkit.com$ [NC] RewriteRule ^(.)$ http://www.datasciencetoolkit.com$1 [R=301,L]
We have an internal TwoFishes server running on port 8081, so redirect
" > /etc/apache2/sites-enabled/000-default' sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load sudo ln -s /etc/apache2/mods-available/proxy.load /etc/apache2/mods-enabled/proxy.load sudo ln -s /etc/apache2/mods-available/proxy_http.load /etc/apache2/mods-enabled/proxy_http.load sudo ln -s /etc/apache2/mods-available/headers.load /etc/apache2/mods-enabled/headers.load
sudo /etc/init.d/apache2 restart
I now go to http://192.168.0.5:8000 and I get the datasciencetoolkit webpage along with all the tools!! Nice!!