greenelab / connectivity-search-backend

Django backend for hetnet connectivity search
https://search-api.het.io
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Improvements to the database accessibility #75

Closed dhimmel closed 3 years ago

dhimmel commented 4 years ago

User feedback in https://github.com/greenelab/connectivity-search-backend/issues/74 highlighted some issues with the accessibility of our database. I updated the README in https://github.com/greenelab/connectivity-search-backend/commit/a6d558825af1bff9dd114cfff543cb220c38464e, but there are still several improvements I'd like to make.

  1. publicly archiving hetmech-pg_dump.sql.gz. I will look into whether we can upload this to Zenodo.

  2. adding instructions to the readme on how to load hetmech-pg_dump.sql.gz. @dongbohu can you do this?

  3. creating a read-only database user and making the database URL and this user and password public knowledge. This will make it much easier for us to share queries since we can give anyone read access to the database. Also there is code in Hetmech that directly queries the db, and these notebooks are not reproducible without access.

    The database doesn't contain any sensitive information, so I think the risk of abuse is low. We can always disable the read-only user if there is some unintended consequence like excessive cloud costs. @dongbohu, we might already have a read-only user, but can you look into this and provide the details?

dongbohu commented 4 years ago

@dhimmel Do you already have hetmech-pg_dump.sql.gz available somewhere? If you do, I can modify the readme file.

I forgot whether the read-only user account is still valid. I will create one if not. Do you think we should put this account credential to readme file too?

dongbohu commented 4 years ago

@dhimmel: Currently the size of the hetmech Postgres database is ~55 GB. Instead of generating a pg_dump file, do you think it will be better to generate a Postgres docker image and publish it on docker hub? It will be probably easier for user to set up, and we don't have to make the read-only account public any more.

By the way, I did find the read-only-user account and it still works.

dhimmel commented 4 years ago

Do you already have hetmech-pg_dump.sql.gz available somewhere?

It was on my Greene Lab workstation. I was able to use this script to upload the file to our Zenodo deposit. So the file is now available at https://zenodo.org/record/3978766. I renamed it to connectivity-search-pg_dump.sql.gz so it matches this repository's name.

Do you think we should put this account credential to readme file too?

Yes the README or in this file, perhaps under public_db:

https://github.com/greenelab/connectivity-search-backend/blob/a6d558825af1bff9dd114cfff543cb220c38464e/dj_hetmech/secrets-template.yml#L1-L6

Let's make sure the username is self-documenting (if I remember it contains read_only right?)

Instead of generating a pg_dump file, do you think it will be better to generate a Postgres docker image and publish it on docker hub?

And include the entire database in the image or have the container load the database upon launch? I think there's a big benefit to the public-read-only user because it has the least overhead and requires the least infrastructure for users.

dhimmel commented 4 years ago

We'll probably want to make it so we can use search-db.het.io as the database URL by making an alias record to the AWS address. This way if we ever change the AWS address, we can just remap that URL. And it's easier to remember.

dhimmel commented 4 years ago

@dongbohu can you comment here with the read-only user access details. Basically what values to fill in for:

https://github.com/greenelab/connectivity-search-backend/blob/a6d558825af1bff9dd114cfff543cb220c38464e/dj_hetmech/secrets-template.yml#L2-L6

And regarding the database host URL is that stable such that we could make search-db.het.io alias it?

dongbohu commented 4 years ago

@dhimmel: I have been working on something else and totally forgot this issue. Will do it sometime next week.

dongbohu commented 4 years ago

@dhimmel:

name: dj_hetmech
user: read_only_user
password: tm8ut9uzqx7628swwkb9
host: hetmech-db.cobepk65dd7j.us-east-1.rds.amazonaws.com
port: 5432

The database host URL is stable now. Feel free to create alias.

dhimmel commented 3 years ago

Added a DNS record for

search-db 1800 IN CNAME hetmech-db.cobepk65dd7j.us-east-1.rds.amazonaws.com.

Now if I run nslookup search-db.het.io, I get:

Server:     127.0.0.53
Address:    127.0.0.53#53

Non-authoritative answer:
search-db.het.io    canonical name = hetmech-db.cobepk65dd7j.us-east-1.rds.amazonaws.com.
hetmech-db.cobepk65dd7j.us-east-1.rds.amazonaws.com canonical name = ec2-3-208-187-112.compute-1.amazonaws.com.
Name:   ec2-3-208-187-112.compute-1.amazonaws.com
Address: 3.208.187.112