nmdp-bioinformatics / gfe-db

Graph database representing IPD-IMGT/HLA sequence data as GFE
https://gfe-db.readthedocs.io
GNU General Public License v3.0
9 stars 15 forks source link

Validate CSV files for the release are available in S3 before continuing load process #73

Closed chrisammon3000 closed 4 months ago

chrisammon3000 commented 1 year ago

Description

Currently if a build script succeeds but fails to store the output in S3 (CSV files for the release) the load process will continue but will fail because of the missing data.

Fix

Validate that data is present before beginning the load process → load_db.sh:63

# Download data to NEO4J_HOME/import
echo "$(date -u +'%Y-%m-%d %H:%M:%S.%3N') - Downloading CSV data for release $RELEASE"
aws s3 cp --recursive s3://$DATA_BUCKET_NAME/$S3_CSV_PATH/ $NEO4J_IMPORT_PATH/

# TODO make sure the CSV files are present and/or not empty before continuing
chrisammon3000 commented 1 year ago

Tasks