road86 / bahis-data

Repository for cleaning and adjusting BAHIS-related data
0 stars 0 forks source link

Set up data pipeline for dashboard on otter #18

Closed mixmixmix closed 1 year ago

mixmixmix commented 1 year ago
mixmixmix commented 1 year ago

Here's to give you an idea of the pipeline currently planned:

set -xe
date
# we are killing the dashboard to be able to perform data processing withoug running out of memory
# The script will stop at this command if for some reason the dashboa
pkill -f index.py
sudo -u postgres psql -f init.sql
cd ../input
# copying on local network from weasel
rsync --append --partial -chvP -e "ssh" root@192.168.0.7:/home/habis/coredb_bup.tar.gz .
tar xf coredb_bup.tar.gz
sudo -u postgres psql -d coredb -f coredb_bup.sql
sudo -u postgres psql -d bahistot -f bahistot.sql
cd /bahis-data/
/bahis-data/.venv/bin/python server-scripts/import_data.py
/bahis-data/.venv/bin/python prep_dash/prepgeojson.py
/bahis-data/.venv/bin/python prep_dash/prep_data.py
cp -r /bahis-data/output/* /bahis-dash/exported_data/
cd /bahis-dash/
/bahis-dash/.venv/bin/python /bahis-dash/index.py

It seems that otter is not able to run prep_data.py script for the dashboard in a reasonable time (currently almost 2hrs and still running). Additionally it might run out of memory so I've set it up that it stops dashboard process for the time of processing. Processing of oldbahis data is also too memory heavy for the poor otter but this can be hacked by uploading every now and then processed file.

@ChasNelson1990 @yokat I think we will need to wait until we have a server in ECTAD with nightly update of the data.

mixmixmix commented 1 year ago

As noted above it takes too long, but the pipeline is mostly ready to be deployed once we have a better server.