mit-jp / mit-climate-data-viz

Plotting climate data for the MIT Joint Program on the Science and Policy of Global Change
https://cypressf.shinyapps.io/eppa-dashboard/
0 stars 0 forks source link

make backend database for MST #179

Closed cypressf closed 3 years ago

cypressf commented 3 years ago

Using https://github.com/cypressf/climate-risk-map-service as a repo for this

Rust web service with the actix framework, using sqlx, and the sqlx migration tool, and loading database config from environment variables.

cypressf commented 3 years ago

Looks like if I use sqlx, and want to run the migration in main.rs or something, I should use this workaround for an existing bug: https://github.com/launchbadge/sqlx/issues/863

cypressf commented 3 years ago

I'm trying out sqlx-cli.

CREATE USER 'mit'@'localhost' IDENTIFIED BY '<password>';
GRANT SELECT, INSERT, UPDATE, DELETE ON `climate_risk_map`.* TO 'mit'@'localhost'
sqlx database create

but I get

error: error returned from database: 1044 (42000): Access denied for user 'mit'@'localhost' to database 'climate_risk_map'

Even after adding the proper database url to my environment vars:

DATABASE_URL=mysql://mit:<password>@localhost/climate_risk_map
cypressf commented 3 years ago

Derp. I forgot to grant CREATE permissions to that user.

CREATE USER 'mit'@'localhost' IDENTIFIED BY '<password>';
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP ON `climate_risk_map`.* TO 'mit'@'localhost'

set DATABASE_URL=mysql://mit:<password>@localhost/climate_risk_map in .env

then

sqlx database create

works just fine!

cypressf commented 3 years ago

I'm using https://github.com/jamesjmeyer210/actix_sqlx_mysql_user_crud as an example repo to work off of, for using sqlx to read from the database. Currently, I'm adapting it to work with sqlx 0.5 (it was written for sqlx 0.3).

cypressf commented 3 years ago

The MySqlRow object in sqlx 0.5 no longer accepts a lifetime annotation so I need to rework the Table class to work with that.

cypressf commented 3 years ago

I reworked the example of the actix CRUD app to work with the latest version of sqlx, and pushed the changes to https://github.com/cypressf/climate-risk-map-service. I also created an initial db schema. I'm working on prepopulating it with initial data. Next step will be to create a rest service to get that data, and run it on svante in a container.

cypressf commented 3 years ago

Working on inserting all the county-level data now. I am creating a .csv file with all the data we currently display in the website, in a format that matches the schema of the county_data table. Work for that is in this python notebook. I am now in the process of trying to ingest the resulting csv file into mysql. It's quite large so I ran into issues with sqlx refusing to run it as a sql command.

cypressf commented 3 years ago

I split the sql file into 3 parts and sqlx was happy to run those. The commands I used to split the file and reformat for sql insert are in the python notebook.

cypressf commented 3 years ago

Potential queries

Get all the water data metadata

select t1.*, t2.*, t3.*
from
    map_visualization_collection as t1,
    map_visualization as t2,
    dataset as t3
where t1.category = 1
and t2.id = t1.map_visualization
and t3.id = t2.dataset
order by t1.order
cypressf commented 3 years ago

I'm trying to install this on heroku for the time being, so we can get it up-and-running on the internet before the svante container is ready. Following the rust buildpack instructions

cypressf commented 3 years ago

making data queries fast and efficient

I'll need to make the data endpoint more efficient. Probably best to get all the data in csv format, or something even more efficient, instead of json. I think csv strikes a good balance between human-readable (can be used by someone who's just poking at the API) and compact. With gzip it will definitely be faster to compile csv-formatted data from the database for a given website tab, or a particular data-selection.

The question is what format should I use for years and sources. I could put the data in a separate column for each year and source, or I could add a year column and a source column. A year and source column is cleaner (it is tidy data), but it could cause the file size to be quite large if the years and sources aren't shared amongst different variables. For example, "water quality" comes from the EPA, whereas "flood risk" comes from first street, therefore, a measurement for a specific county's water quality would have to be specified on a different line from the flood risk, because the "dataset" column would be different.

cypressf commented 3 years ago

moving from mysql to postgres

I moved from mysql to postgres so we can use PostGIS (for storing GIS data) in the future.

https://github.com/ncrmro/planet-express is an example app running on almost the same tech stack. They use docker instead of podman, but I think I should be able to adapt that to work with podman. They claim the code is easily deployable to heroku, so I'm gonna see exactly how they do that.

cypressf commented 3 years ago

setup vagrant and podman

I installed the thin podman client on my local macOS machine, and podman on centos 8 linux vagrant box:

brew install podman # install podman and vagrant
brew install vagrant
vagrant init generic/centos8 # setup vagrant box
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 2200 vagrant@127.0.0.1 # the default password is vagrant. if not, run vagrant ssh, then manually copy the id_rsa.pub to ~/.ssh/authorized_keys
ssh -p 2200 vagrant@127.0.0.1

now on the vagrant ssh session, install podman and run the remote server

# install the latest version of podman, to be compatible with the client installed by home-brew
sudo dnf -y module disable container-tools
sudo dnf -y install 'dnf-command(copr)'
sudo dnf -y copr enable rhcontainerbot/container-selinux
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_8/devel:kubic:libcontainers:stable.repo
sudo dnf -y --refresh install podman
# run the podman remote
systemctl --user enable --now podman.socket 
sudo loginctl enable-linger $USER # keep it running on logout
exit # back to macOS host

and finally setup the podman thin client on macOS

podman system connection add vagrant ssh://vagrant@127.0.0.1:2200
podman info # this should work! now you can use any podman commands

sources:

how to use podman remote client how to install podman on centos how to get started with vagrant

cypressf commented 3 years ago

http in podman container on vagrant

Here's a test to run an http server inside a podman container inside a vagrant box, and curl it from the host.

forward port 8080 on the host to port 8080 in the vagrant box by adding the following to Vagrantfile

config.vm.network "forwarded_port", guest: 8080, host: 8080, host_ip: "127.0.0.1"

open port 8080 on the vagrant box

ssh -p 2200 vagrant@127.0.0.1
sudo firewall-cmd --zone=public --permanent --add-port 8080/tcp
firewall-cmd --reload
exit

run a simple http server in a podman container in the vagrant box via the podman thin client on the host, and GET it

podman run -dt -p 8080:80/tcp docker.io/library/httpd
curl localhost:8080
<html><body><h1>It works!</h1></body></html>
cypressf commented 3 years ago

actix web framework in podman container on vagrant

using https://hub.docker.com/r/htli/actix for now

podman pull docker.io/htli/actix
cypressf commented 3 years ago

I met with @mjbludwig and he's going to make a podman pod with a couple containers (for the web service, postgres, and the frontend static files).

cypressf commented 3 years ago

from @mjbludwig:

I got a postgres container working with a container with your app in it and making a database that is mounted on the host! I could curl the state bits from inside a container in the pod, next is to be able to curl from the host.

cypressf commented 3 years ago

from @mjbludwig:

Here is the process to build the pod and containers

  1. podman pod create --name crm_pod -p 8000:8000 # This makes a pod called "crm_pod" and maps port 8000 inside the pod to 8000 outside

  2. podman run --pod=crm_pod -v /home/mludwig/Work/JP/climate_risk_map/crm_db/:/var/lib/postresql/data:Z -e POSTGRES_PASSWORD=password -e POSTGRES_USER=username --name crm_db -d postgres You will need to change the -v bits to a directory where the database will live on your local machine so change /home/mludwig/Work/JP/climate_risk_map/crm_db/ to somewhere local

  3. save this as a script:

#!/bin/bash

ctr=climate_risk_map_service

buildah from --name $ctr rust
buildah run $ctr cargo install sqlx-cli
buildah run $ctr /bin/bash -c 'cd /opt; git clone https://github.com/cypressf/climate-risk-map-service.git'

buildah run $ctr /bin/bash -c 'echo APP_URL=127.0.0.1 >> /opt/climate-risk-map-service/.env'
buildah run $ctr /bin/bash -c 'echo APP_PORT=8000 >> /opt/climate-risk-map-service/.env'
buildah run $ctr /bin/bash -c 'echo DATABASE_URL=postgres://username:password@localhost/crm_db >> /opt/climate-risk-map-service/.env'
buildah config --entrypoint "cd /opt/climate-risk-map-service; sqlx database create && sqlx migrate run && cargo run" $ctr
buildah commit $ctr $ctr
  1. run script.
  2. podman run -d --pod=crm_pod --name crm_ctr climate_risk_map_service Now wait about a minute and finally:
  3. podman exec -it crm_ctr bash then you can curl localhost:8000/state
cypressf commented 3 years ago

In order for me to run the script in step 3 of my last comment, I must install buildah. I'm unsure if I can run a thin client on mac like I can with podman to modify the container on my vagrant box. Checking now.

cypressf commented 3 years ago

buildah is not available on macOS

cypressf commented 3 years ago

I'm installing buildah on my centOS box using the instructions from their website.

# CentOS 8
sudo dnf -y module disable container-tools
sudo dnf -y install 'dnf-command(copr)'
sudo dnf -y copr enable rhcontainerbot/container-selinux
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/CentOS_8/devel:kubic:libcontainers:stable.repo
# OPTIONAL FOR RUNC USERS: crun will be installed by default. Install runc first if you prefer runc
sudo dnf -y --refresh install runc
# Install Buildah
sudo dnf -y --refresh install buildah
cypressf commented 3 years ago

In the script above, I would modify to use a fully-qualified container name docker.io/library/rust instead of just rust in the line

buildah from --name $ctr rust

cypressf commented 3 years ago

After I run podman run -d --pod=crm_pod --name crm_ctr climate_risk_map_service and wait about a minute, when I run the final step I get an error

podman exec -it crm_ctr bash
Error: can only create exec sessions on running containers: container state improper

I get the following with podman ps

d5e0f1c0a470  docker.io/htli/actix:latest  bash     7 days ago  Up 7 days ago  0.0.0.0:8080->80/tcp    priceless_gould
c760645b5fa8  k8s.gcr.io/pause:3.5                  6 days ago  Up 6 days ago  0.0.0.0:8000->8000/tcp  d1d6ccd75110-infra

(I think those are some old containers I was messing around with)

cypressf commented 3 years ago

I deleted all my pods and containers and tried again. I got it to work this time. I also had to change curl localhost:8000/state to curl 0.0.0.0:8000/state, probably because the name wasn't mapped on my centos install in hosts or something

cypressf commented 3 years ago

I could successfully curl from a bash running in the container, but like I can't curl from the CentOS host:

curl 0.0.0.0:8000/state
curl: (56) Recv failure: Connection reset by peer
cypressf commented 3 years ago

Running containers look good to me.

podman ps -p
CONTAINER ID  IMAGE                                      COMMAND   CREATED         STATUS             PORTS                   NAMES               POD ID        PODNAME
0f161fcbc250  k8s.gcr.io/pause:3.5                                 33 minutes ago  Up 30 minutes ago  0.0.0.0:8000->8000/tcp  42239f089eae-infra  42239f089eae  crm_pod
ca7c7d713e6f  docker.io/library/postgres:latest          postgres  30 minutes ago  Up 30 minutes ago  0.0.0.0:8000->8000/tcp  crm_db              42239f089eae  crm_pod
0e86d40e022b  localhost/climate_risk_map_service:latest  bash      11 minutes ago  Up 11 minutes ago  0.0.0.0:8000->8000/tcp  crm_ctr             42239f089eae  crm_pod
cypressf commented 3 years ago

running with APP_URL=0.0.0.0 instead of 127.0.0.1 fixed it!! I got a successful curl from the CentOS host

podman run -d --pod=crm_pod --name crm_ctr -e APP_URL=0.0.0.0 climate_risk_map_service
# wait about a minute... and then
curl 0.0.0.0:8000/state
[{"id":1,"name":"Alabama"},{"id":2,"name":"Alaska"},{"id":4,"name":"Arizona"},{"id":5,"name":"Arkansas"},{"id":6,"name":"California"},{"id":8,"name":"Colorado"},{"id":9,"name":"Connecticut"},{"id":10,"name":"Delaware"},{"id":11,"name":"D.C."},{"id":12,"name":"Florida"},{"id":13,"name":"Georgia"},{"id":15,"name":"Hawaii"},{"id":16,"name":"Idaho"},{"id":17,"name":"Illinois"},{"id":18,"name":"Indiana"},{"id":19,"name":"Iowa"},{"id":20,"name":"Kansas"},{"id":21,"name":"Kentucky"},{"id":22,"name":"Louisiana"},{"id":23,"name":"Maine"},{"id":24,"name":"Maryland"},{"id":25,"name":"Massachusetts"},{"id":26,"name":"Michigan"},{"id":27,"name":"Minnesota"},{"id":28,"name":"Mississippi"},{"id":29,"name":"Missouri"},{"id":30,"name":"Montana"},{"id":31,"name":"Nebraska"},{"id":32,"name":"Nevada"},{"id":33,"name":"New Hampshire"},{"id":34,"name":"New Jersey"},{"id":35,"name":"New Mexico"},{"id":36,"name":"New York"},{"id":37,"name":"North Carolina"},{"id":38,"name":"North Dakota"},{"id":39,"name":"Ohio"},{"id":40,"name":"Oklahoma"},{"id":41,"name":"Oregon"},{"id":42,"name":"Pennsylvania"},{"id":44,"name":"Rhode Island"},{"id":45,"name":"South Carolina"},{"id":46,"name":"South Dakota"},{"id":47,"name":"Tennessee"},{"id":48,"name":"Texas"},{"id":49,"name":"Utah"},{"id":50,"name":"Vermont"},{"id":51,"name":"Virginia"},{"id":53,"name":"Washington"},{"id":54,"name":"West Virginia"},{"id":55,"name":"Wisconsin"},{"id":56,"name":"Wyoming"},{"id":60,"name":"American Samoa"},{"id":66,"name":"Guam"},{"id":69,"name":"Northern Mariana Islands"},{"id":72,"name":"Puerto Rico"},{"id":74,"name":"U.S. Minor Outlying Islands"},{"id":78,"name":"U.S. Virgin Islands"}]
cypressf commented 3 years ago

also when testing curling within the container, I simplified

podman exec -it crm_ctr bash
curl 0.0.0.0:8000/state
exit

to

podman exec -it crm_ctr curl 0.0.0.0:8000/state
cypressf commented 3 years ago

@mjbludwig I moved the backend and the frontend to the same repo: https://github.com/cypressf/climate-risk-map

all frontend code is in frontend directory and all backend code is in backend

cypressf commented 3 years ago

The container should run yarn build in the /frontend directory, which will create an optimized production build in the /frontend/build directory

cypressf commented 3 years ago

Install and build the frontend

@mjbludwig, here's are the full steps for installing the needed dependencies in CentOS and building the frontend. I tested in a fresh CentOS 8 box. This uses the node binary sources from the node source repo.

curl -fsSL https://rpm.nodesource.com/setup_16.x | sudo bash -
curl -sL https://dl.yarnpkg.com/rpm/yarn.repo | sudo tee /etc/yum.repos.d/yarn.repo
sudo yum install -y yarn
sudo yum install -y git
git clone git@github.com:cypressf/climate-risk-map.git
cd climate-risk-map/frontend
yarn install
yarn build

After yarn build is complete, you'll have a climate-risk-map/frontend/build directory, which contains the index.html and all the other static resources.

cypressf commented 3 years ago

Revisiting https://github.com/cypressf/mit-climate-data-viz/issues/179#issuecomment-864287181, I'm going to start out by using tidy data, and returning a start_date, end_date, source, and value column. If the file size is too large, I might reconsider.

cypressf commented 3 years ago

Returning data in an efficient format

Test 1: csv::Writer.serialize(), tidy data with column for county_id, state_id, source, start_date, end_date, value

GET 127.0.0.1:8080/data/1

size time
json 1780 KB 1151 ms
csv 608 KB 1009 ms
cypressf commented 3 years ago

@mjbludwig mentioned via email that the website is now deployed in a container at https://svante3.mit.edu. @mjbludwig, do you know if the actix service is accessible on https://svante3.mit.edu as well? https://svante3.mit.edu/api/[endpoint] would be a good endpoint for it.

cypressf commented 3 years ago

Once the service is accessible on the same subdomain, it should be a straightforward path to switch over to use the database from the csv files.