Closed dlebauer closed 7 years ago
I've added PG studio to the catalog. Given the discussion today, it sounds like the second requirement will require me to come up with a strategy to sync the BETY data to a local instance on the Workbench server.
Strategy for syncing BETYdb: @robkooper is the architect of the system, but there should be two steps
load.bety.sh -c -r 6
A draft workflow that will likely require feedback from @robkooper
Currently there are cron jobs to dump records from bety6, but neither is sufficient.
dump.bety.sh
only dumps records with ids between 6 and 7 billion because the database id is 6
/home/bety/dump.bety.sh -q -l 2 -u YES -m 6 -o /home/bety/bety/public/dump
crontab -l -u bety
pg_dumpall
dumps the entire database but does not anonymize users.
crontab -l -u postgres
/usr/bin/pg_dumpall | gzip --rsyncable > /home/backup/postgresql/dumpall.sql.gz
load.bety.sh -c -r 6 # to create the database
load.bety.sh -r 6 # to update the database in a cron job
should work, but don't. One reason is that it is missing records from remote sites 0 and 8. We do not want to run load.bety.sh -r 0
because we don't want to bloat bety6 with all of the public data and metadata from that system. The subset of records that I have copied from bety0 to bety6 (e.g. variables, pfts, priors) should be kept.
cd /tmp
rsync bety6.ncsa.illinois.edu:/home/backup/postgresql/dumpall.sql.gz
tar -xvf dumpall.sql.gz
psql -d bety < dumpall.sql.gz
The pg_dumpall
script could be modified to also provide a dump with anonymized users largely reusing code from line 194 of dump.bety.sh https://github.com/PecanProject/pecan/blob/master/scripts/dump.bety.sh#L194
https://github.com/terraref/reference-data/issues/111 https://github.com/PecanProject/pecan/issues/1355
@robkooper Looking again at how best to do this. David mentioned that he's remove viewer access from the users table (https://github.com/terraref/computing-pipeline/issues/262) , so direct connection may be an option again. As discussed last week, we have a few options:
If we can't get access to 5432 directly from workbench via internal network, my inclination is to scp or rsync the dumpall output. I guess I could setup authorized keys between Workbench and ROGER (or bety6). In this case, I know we'd need to anonymize after import.
Thoughts?
You can use the updated load.bety.sh script from https://raw.githubusercontent.com/PecanProject/pecan/357b70e21592375827e9c55bc5e1ce8e3426bd78/scripts/load.bety.sh
This allows you to specify the URL to get the data from:
load.bety.sh -c -u -m 99 -r 0 -w https://terraref.ncsa.illinois.edu/bety/dump/bety0/bety.tar.gz
load.bety.sh -m 99 -r 6 -w https://terraref.ncsa.illinois.edu/bety/dump/bety6/bety.tar.gz
This will reset the database (-c) and create some default users (-u).
Thanks, Rob. I have an almost working postgis container that uses the above load process. Just doing some sanity checking, I noticed the traits table has no rows?
@robkooper did you set up a cron job to dump and load the data? If so could you paste it here so it will be added to the documentation? https://github.com/terraref/documentation/issues/153
@robkooper PS thanks for such a quick turnaround!
@robkooper regarding the traits table not having any rows ... did you use -l 2
in the dump.bety.sh
script to dump data with access_level = 2?
The following runs every hour on the hour:
#!/bin/bash
wget -q -O /home/bety/dump.bety.sh https://raw.githubusercontent.com/PecanProject/pecan/master/scripts/dump.bety.sh
chmod 755 /home/bety/dump.bety.sh
/home/bety/dump.bety.sh -q -l 2 -u YES -m 0 -o /home/bety/bety/public/dump/bety0
/home/bety/dump.bety.sh -q -l 2 -u YES -m 6 -o /home/bety/bety/public/dump/bety6
@craig-willis can you check on the import and see if there is any errors. Looking at the dump there should be traits data:
-rw-r--r-- bety/users 159266947 2017-05-19 09:34 ./traits.csv
Thanks, @robkooper . My mistake, it looks like I was querying the table before the traits data had completed loading.
@dlebauer @robkooper
I have a first-pass at this running on terraref.ndslabs.org. The Dockerfile is on github:
https://github.com/terraref/toolserver-dockerfiles/tree/kitchensink/bety
You can connect to the DB via the Postgres Studio IP: 10.0.0.183 Database: bety User: bety Password: bety
This is only accessible from within workbench. You should also be able to connect via any postgres client.
Feedback welcome. In the end, I'll provide a name instead of IP, but for today you just have the IP.
@craig-willis thanks! could you run commands to restrict read access to tables from the bety user? I think it helps to clear the clutter https://gist.github.com/dlebauer/4aae063ac38d1dba9859da1ddfb5d2c5#file-revoke_select_from_bety_user-sql
Also, when I load the SQL Worksheet and try to run
select count(*) from traits;
I get the error message as popup
Server returned TRANSPORT_ERROR with no error message
Actually, that only happens if I hit enter in the window and then try to run it. Somehow hitting enter causes it to log out because when I reload the page I am redirected to login. If I just click the green 'play' button it returns the expected result. Perhaps a bug in the interface, not a big deal.
I've run your SQL statements and will add to the container.
@dlebauer I still need to implement the cronjob on this, but is it otherwise working as hoped?
Yep it is awesome. Thanks! On Thu, May 25, 2017 at 9:09 AM Craig Willis notifications@github.com wrote:
@dlebauer https://github.com/dlebauer I still need to implement the cronjob on this, but is it otherwise working as hoped?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/302#issuecomment-304019260, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5yDa1wOx9A2-5f2TGXBd5xoJ6vImks5r9YucgaJpZM4NGliu .
It would be great to have access to BETYdb using PostgreSQL Studio with either a) read-only access to bety6 or b) a local 'sandbox' copy.
For a read only connection, this is how we have provided support for GIS users (as described in documentation).
Users can request ssh login access. But it would be ideal if we could provide a generic existing tunnel.
Then the postgres server is mounted (?) in a way that I can treat it as a local database.
Completion Criteria
ssh tunnel to bety6 (the terraref trait database), ability to connect to bety6 (with viewer account) via ssh tunnel for users with permission