nmdp-bioinformatics / gfe-db

Graph database representing IPD-IMGT/HLA sequence data as GFE
https://gfe-db.readthedocs.io
GNU General Public License v3.0
9 stars 15 forks source link

Private subnet configuration for Neo4j 5 and Docker for running gfe-db locally #101

Closed chrisammon3000 closed 7 months ago

chrisammon3000 commented 10 months ago

Description

Major update to support private deployments with and without VPC, VPC endpoints and NAT Gateway for Neo4j 5.

Summary of Changes

All services and configuration include:

Environment Variables

Variable Data Type Example Value Required Notes
AWS_PROFILE string user_profile Yes AWS account profile name
APP_NAME string myapp Yes Application name
AWS_REGION string us-east-1 Yes AWS region
CREATE_VPC bool true/false Yes Whether to create a VPC
USE_PRIVATE_SUBNET bool true/false Yes Use private subnet if true
ADMIN_EMAIL string admin@example.com Yes Administrator's email
SUBSCRIBE_EMAILS string notify@example.com Yes Emails for subscription
GITHUB_REPOSITORY_OWNER string ANHIG Yes Owner of the GitHub repository
GITHUB_REPOSITORY_NAME string IMGTHLA Yes Name of the GitHub repository
NEO4J_AMI_ID string ami-xxxxxxx Yes AMI ID for Neo4j
APOC_VERSION string 4.4.0.3 Yes Version of APOC
GDS_VERSION string 2.0.1 Yes Version of GDS
GITHUB_PERSONAL_ACCESS_TOKEN string ghp_xxxxxxxxxxxxxx Yes GitHub personal access token
FEATURE_SERVICE_URL string https://api.example.com Yes URL of the Feature service
DOCKER_USERNAME string my_username Conditional Required for building local gfe-db with Docker
DOCKER_PASSWORD string password Conditional Required for building local gfe-db with Docker
HOST_DOMAIN string example.com Conditional Required if USE_PRIVATE_SUBNET=false
SUBDOMAIN string sub.example.com Conditional Required if USE_PRIVATE_SUBNET=false
HOSTED_ZONE_ID string ZXXXXXXXXXXXXX Conditional Required if USE_PRIVATE_SUBNET=false
VPC_ID string vpc-xxxxxxxx Conditional Required if CREATE_VPC=false
PUBLIC_SUBNET_ID string subnet-xxxxxxxx Conditional Required if CREATE_VPC=false and USE_PRIVATE_SUBNET=false
PRIVATE_SUBNET_ID string subnet-xxxxxxxx Conditional Required if CREATE_VPC=false and USE_PRIVATE_SUBNET=true
CREATE_SSM_VPC_ENDPOINT bool true/false Conditional Required if USE_PRIVATE_SUBNET=true
SSM_VPC_ENDPOINT_ID string vpce-xxxxxxxx Conditional Required if CREATE_SSM_VPC_ENDPOINT=true
CREATE_SECRETSMANAGER_VPC_ENDPOINT bool true/false Conditional Required if USE_PRIVATE_SUBNET=true
SECRETSMANAGER_VPC_ENDPOINT_ID string vpce-xxxxxxxx Conditional Required if CREATE_SECRETSMANAGER_VPC_ENDPOINT=true
CREATE_S3_VPC_ENDPOINT bool true/false Conditional Required if USE_PRIVATE_SUBNET=true
S3_VPC_ENDPOINT_ID string vpce-xxxxxxxx Conditional Required if CREATE_S3_VPC_ENDPOINT=true
DEPLOY_NAT_GATEWAY bool true/false Conditional Required if USE_PRIVATE_SUBNET=true
EXTERNAL_NAT_GATEWAY_ID string nat-xxxxxxxx Conditional Required if DEPLOY_NAT_GATEWAY=false
DEPLOY_BASTION_SERVER bool true/false Conditional Optional if USE_PRIVATE_SUBNET=true
ADMIN_IP string 192.168.1.1/32 Conditional Required if DEPLOY_BASTION_SERVER=true

Note: "Conditional" in the "Required" column indicates that the requirement of the variable depends on specific configurations or conditions.

Infrastructure

Database

Pipeline

New Usage

Connecting to Neo4j Browser running in a private subnet

For deployments using a private subnet, Neo4j Browser can be accessed through port forwarding. For first time access follow these steps:

  1. Change the permissions of the SSH key.
    chmod 400 <stage>-gfe-db-us-east-1-neo4j-key.pem
  2. Connect to the database server so that your identity is stored on the machine and accept the prompts to connect.
    STAGE=<stage> make database.connect
    > Are you sure you want to continue connecting (yes/no)? yes
  3. In a new shell, connect to Neo4j Browser and accept any additional prompts to connect.
    STAGE=<stage>make database.ui.connect
    > Neo4j Browser is available at: http://localhost:7474/browser/

These steps only need to be performed the first time you want to connect to Neo4j Browser. After this all you need to do is run STAGE=<stage>make database.ui.connect and navigate to http://localhost:7474/browser/ to use the graph.

Run gfe-db Locally Using Docker

Once the application has been deployed and the database is loaded, it is possible to build and run the latest version of gfe-db locally using Docker.

Build Environment

Make sure you have added your Docker Hub credentials to your .env file.

# .env.<stage>
DOCKER_USERNAME=<username>
DOCKER_PASSWORD=<password>

Usage

Build and push the image to Docker Hub. The Makefile will automatically fetch the most recent backup data from S3 and use it to build the image. You can access the logs in ./gfe-db/local/neo4j/logs.

STAGE=<stage> make local.build

Once the image is built and pushed to Docker Hub you can run the command to run the most recent version of gfe-db locally.

# Run from the root directory of gfe-db
docker run \
    --restart always \
    --publish=7474:7474 --publish=7687:7687 \
    --volume=$(pwd)/gfe-db/local/neo4j/logs:/logs \
    $DOCKER_USERNAME/gfe-db:latest

Next Steps

chrisammon3000 commented 9 months ago

@pbashyal-nmdp The conflict in the requirements.txt file is for py-gfe. It can be switched back but I think you might need to update the py-gfe release on PyPI (probably to 1.1.6 for the patch).