systemapic / wu

Systemapic web server and API
https://systemapic.com
2 stars 3 forks source link

strk info #246

Open knutole opened 8 years ago

knutole commented 8 years ago

Hi Sandro, here's a overview of our plans ahead, and areas where we'd like some help:

Our vision

Best practices: We need to make sure our PostGIS backend setup makes sense and is scalable and secure. We're currently creating a new database for each user, and putting all their datasets in separate tables. We'd like to make sure that rendering speeds are as fast as possible, that loads are distributed, that everything in short is optimized. Also, we need some way to control disk usage per user, and I saw there were some nice sql-scripts you've done for CartoDB.

Security: Harden our PostGIS setup and lock it down completely.

Docker: PostGIS is currently in a separate container in our Docker Compose setup. We want to be able to query any PostGIS container from any tile-server container (which should be easy, simply an extra address point), and some automated backup of data to make our containers fault-tolerant. Our entire setup needs to be scalable, and it's simply a matter of adding more empty PostGIS containers to the swarm to scale.

Rasters: We are doing a large raster project in Q1 with a client, where we'll implement support for rasters and calculations on rasters. The rasters in question are low-res satellite images of snow-coverage, and we've found that vectorizing them and doing the calculations with vectors work well for this datatype. However, it's rather avoiding the problem of implementing real raster support in PostGIS, and we'd like to look at the possiblities of doing rasters the right way (if there is such a thing) in PostGIS. We're also handling more convetional rasters, ie. as overlays, and currently we're simply tiling them up with a python scripts. Possibilites for creating tiles on the fly from PostGIS for rasters would be interesting to look at also.

Operations: 1) We'd like to implement the possiblity of cutting rasters on-the-fly, ie. drawing a polygon in the client (browser) and intersecting the raster to the polygon, cutting/cropping the raster. 2) Same for vectors. 3) Look at what other operations we can do on rasters and vectors, and implement a list of SQL scripts that is easily pluggable and expandable. Also, script hooks on import, etc.

Client-side table: We'd like to implement a client-side table where user can view and interact with a PostGIS table.

SQL API: I guess all this is best set up as an API for PostGIS. Should include creating of new tables, importing of a list of formats, import from other databases (ArcGIS, Oracle, PostGIS)

Vector tiles: We don't currently have full support for vector tiles from PostGIS (only some proof-of-concept), but we need this implemented asap. I'll be implementing support for vector tiles client side. We need to look at simplification/clustering of polygons, points etc. for vector tiles. This will go in our tileserver pile.

Optimizations: We want to look at parallel PostGIS processing, https://github.com/gbb/ppppt, parallel shp2pgsql and twkb format. Also, tile-creation from large datasets, large imports, etc. We're generally dealing with larger amounts of data than eg. CartoDB, and need to make sure we're as optimized as possible to reduce server-load and load-times.

Mapnik: #layer::pseudo styling. We need to impement the pseudo styling possiblities in our tile-server, making it possible to style several separate layers in the same bulk. Hopefully you have experience with this from Windshaft. Our current tileserver is here https://github.com/systemapic/pile, also run in a separate Docker container.

Projections. We need to be able to fetch data from PostGIS in a variety of projections (API)

Way forward

These are the things we'd like to look at in Q1. Obviously there are things we have not thought about, and we're looking forward to getting your input on every aspect of the setup. We'll see how fast time goes, but I'm thinking we could work alongside each other, we work on some things and you on others, and we make sure all is done in accordance with best practices and your guidance.

This is as much as I can say right now, I think. We need to discuss and get feedback on the road ahead. Please let me know how all this sounds to you, and please feel free to use Issues and the repos as you see fit.

Repositories

A quick guide to repos:

knutole commented 8 years ago

Hi @strk, welcome to first day of chaos! :)

I've added a label strk to some of the issues I'd like you to help out with. Feel free to look around at everything else also, of course. I'm a bit unsure as to how this will go forward, so I think it's best we just jump in and see how it goes. I guess you'll do the things you think is most complicated, and whatever you think we can manage, you let us know. Since you'll be with us only two days a week, we can do work based on your recommendations on the other days.

Also, if you have any input on how best to manage a work-process like this, don't hesitate to bring it forward.

First day practical info

I guess the first days will go to getting familiar with what's going on in the code, etc. Also, there is some setup to do:

  1. We have a server dedicated to you, where dev2.systemapic.com is hosted.
  2. You'll get the ssh keys and config tomorrow over a secure channel.
  3. You'll need to setup your GitHub account in /var/www/wu/, /var/www/pile/ and /docks/.
  4. We use Sublime for coding, and rmate for changing files over ssh. This means that we simply do rsub api.geo.js and it pops up in our Sublime. I'm not sure what kind of setup your prefer, but can talk about it tomorrow.

Repositories

  1. The web-server (API) and client libs are located in /var/www/wu/ and /var/www/wu/public/.
  2. The tileserver (pile) is located in /var/www/pile/
  3. The Docker files are located in /docks/, where /docks/dev/ contains the "run-script", and /docks/build/ contains the different build-files for the containers.

Docker 101

I think you mentioned you haven't worked with Docker before. It's very simple however: Docker images are built from a recipe, aka a Dockerfile. Docker containers are instances of such images. For example, I've made a Docker image called systemapic/ubuntu - which is simply a stripped Ubuntu image. If I want to run this image and do something with it, I have to create a container based on the image. The container is ephemeral, when I'm done with it, it can simply be deleted, and so on.

Anyway, our setup consists of 13 containers, some of which are only storage containers. These are run together, and for doing so there is docker-compose. Docker-compose simply reads a docker-compose.yml file, listing the different containers that are being used, how they link to each other, which ports they have open, which image they're based on, and so forth. This is our docker-compose.yml file.

  1. To restart the server, you simply do ./restart.sh in /docks/dev/. That will shut down all containers, flush them, restart them, and put you into the live log from all containers. This is safe to do at any time.
  2. To build an image, you go to eg. /docks/build/postgis/ and run docker build -t strk/postgis ., which will build and image called strk/postgis based on the Dockerfile in the current folder. Or you can do ./build.sh in each folder, as a shortcut. Note, however, that the names given in the build.sh scripts are the one's we're using, so if you overwrite the names, we'll lose the original image. So better if you use strk/ prefix during debugging. The Dockerfile decides what's built, obviously.
  3. To list images, docker images, to list running containers docker ps, to list all containers (incl. stopped ones) docker ps -a, to delete containers docker rm CONTAINER_ID. Container ID can be found with docker ps, to delete images docker rmi IMAGE_ID.
  4. Check out the docker-compose.yml file in /docks/dev/ for how containers are run together.
  5. Now, final caveat: Although the webserver and tileserver are run from containers, these containers actually share the working folder with the host. That means that the code in /var/www/wu for example (on the host) is the code that's being run inside the container. This is for development, makes it easier to change the files. We're using nodemon, and it works well. So you don't have to go inside the container to change the code in /var/www/wu and /var/www/pile.

I think that should cover day one. Talk soon, ciao.

strk commented 8 years ago

Dense lecture, thanks for the 101 (maybe useful to put turn it in a wiki page or a file in the docker-systemapic repo). I guess it'd be worth learning to build and run those dockers on my local machine, don't you think ? I'll be trying that while waiting for those credentials to get to me (PS: for a secure channel you can lookup my pgp key on common keyservers, Key fingerprint = 459E B3A5 E7C5 2ADE 3F3F 68A2 D6C0 7DA4 AC56 2DAD)