technologiestiftung / giessdenkiez-de-dwd-harvester

Gather precipitation data from DWD's radolan data set, for the region of Berlin and connect to the trees DB
https://www.giessdenkiez.de
MIT License
4 stars 9 forks source link
berlin citylab-berlin giessdenkiez-de open-data trees

All Contributors

giessdenkiez-de-dwd-harvester

Pre-Install

I am using venv to setup a virtual python environment for separating dependencies:

python -m venv REPO_DIRECTORY

Install

pip install -r requirements.txt

I had some trouble installing psycopg2 on MacOS, there is a problem with the ssl-lib linking. Following install resolved the issue:

env LDFLAGS='-L/usr/local/lib -L/usr/local/opt/openssl/lib -L/usr/local/opt/readline/lib' pip install psycopg2

GDAL

As some of python's gdal bindings are not as good as the command line tool, i had to use the original. Therefore, gdal needs to be installed. GDAL is a dependency in requirements.txt, but sometimes this does not work. Then GDAL needs to be installed manually. Afterwards, make sure the command line calls for gdalwarp and gdal_polygonize.py are working.

Linux

Here is a good explanation on how to install gdal on linux: https://mothergeo-py.readthedocs.io/en/latest/development/how-to/gdal-ubuntu-pkg.html

Mac

For mac we can use brew install gdal.

The current python binding of gdal is fixed to GDAL==2.4.2. If you get another gdal (ogrinfo --version), make sure to upgrade to your version: pip install GDAL==VERSION_FROM_PREVIOUS_COMMAND

Configuration

Copy the sample.env file and rename to .env then update the parameters, most importantly the database connection parameters.

PG_SERVER=localhost
PG_PORT=54322
PG_USER=postgres
PG_PASS=postsgres
PG_DB=postgres
SUPABASE_URL=http://127.0.0.1:54321
SUPABASE_SERVICE_ROLE=eyJh...
SUPABASE_BUCKET_NAME=data_assets
MAPBOXUSERNAME=your_mapbox_username
MAPBOXTOKEN=your_mapbox
MAPBOXTILESET=your_mapbox_tileset_id
MAPBOXLAYERNAME=your_mapbox_layer_name
SKIP_MAPBOX=False
LIMIT_DAYS=30
SURROUNDING_SHAPE_FILE=./assets/buffer.shp

Running

Starting from an empty database, the complete process of running the DWD harvester consists of three steps:

  1. Preparing the buffered shapefile
  2. Creating the grid structure for the radolan_geometry table
  3. Harvesting the DWD data

1. Preparing the buffered shapefile

Firstly, a buffered shapefile is needed, which is created with the following commands. This step is utilizing the harvester/assets/berlin.prj and harvester/assets/berlin.shp files. Make sure to set the environment variables properly before running this step.

2. Creating the grid structure for the radolan_geometry table

Secondly, the radolan_geometry table needs to be populated. You need to have the buffered shapefile (from the previous step) created and available in ../assets. The radolan_geometry table contains vector data for the target city. The data is needed by the harvest process to find the rain data for the target city area. This repository contains shape files for Berlin area. To make use of it for another city, replace the harvester/assets/berlin.prj and harvester/assets/berlin.shp files. Run the following commands to create the grid structure in the database:

3. Harvesting the DWD data

Make sure to set the environment variables properly before running the script. Make sure that you have succesfully ran the previous steps for preparing the buffered shapefile and creating the grid structure for the radolan_geometry table. The file harvester/src/run_harvester.py contains the script for running the DWD harvester, it does the following:

4. Harvesting daily weather data

For harvesting daily weather data, we use the free and open source BrightSky API. No API key is needed. The script is defined in run_daily_weather.py. Make sure to set all relevant environment variables before running the script, e.g. for a run with local database attached:

PG_SERVER=localhost
PG_PORT=54322
PG_USER=postgres
PG_DB=postgres
PG_PASS=postgres
WEATHER_HARVEST_LAT=52.520008
WEATHER_HARVEST_LNG=13.404954

Make sure that especially WEATHER_HARVEST_LAT and WEATHER_HARVEST_LNG are set to your destination of interest.

Docker

To have a local database for testing you need Docker and docker-compose installed. You will also have to create a public Supabase Storage bucket. You also need to update the .env file with the values from sample.env below the line # for your docker environment.

to start only the database run

docker-compose -f  docker-compose.postgres.yml up

This will setup a postgres/postgis DB and provision the needed tables and insert some test data.

To run the harvester and the postgres db run

docker-compose up

Known Problems

harvester.py throws Error on first run

When running the setup for the first time docker-compose up the provisioning of the database is slower then the execution of the harvester container. You will have to stop the setup and run it again to get the desired results.

Postgres Provisioning

The provisioning sql script is only run once when the container is created. When you create changes you will have to run:

docker-compose down
docker-compose up --build

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Fabian Morón Zirfas
Fabian Morón Zirfas

💻 📖
Sebastian Meier
Sebastian Meier

💻 📖
Dennis Ostendorf
Dennis Ostendorf

💻
Lisa-Stubert
Lisa-Stubert

💻
Lucas Vogel
Lucas Vogel

📖
Jens Winter-Hübenthal
Jens Winter-Hübenthal

💻 🐛
Simon Jockers
Simon Jockers

🚇 💻 🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

Credits



A project by:

Supported by: