Toggles automated backups on & off to resolve the issue of automated scheduled backups shutting down Neo4j and interrupting possible loading in progress.
Backups are performed at the beginning and end of state machine executions.
The build output is validated using several checks to make sure it can be loaded. If build validation fails the execution stops.
Validation queries are executed against Neo4j pre and post-execution for monitoring purposes.
State machine output payload contains new fields from the results of backup and validation steps.
Toggle Automated Backups on/off
When a Step Functions execution is started, the alarm status will be ALARM for the duration of the execution until it succeeds, fails, or aborts. When the execution is finished the ALARM will be OK. When the alarm status becomes ALARM, a notification is sent to SNS which triggers the DisableBackup Lambda function to disable the Neo4j Backup Maintenance Window using the SSM API. When the alarm status changes to OK another SNS notification is sent and the DisableBackup function re-enables the maintenance window.
The logic used by the alarm evaluates several Step Functions execution metrics:
... evaluates to true, the alarm status changes to ALARM. It will return to OK when the expression evaluates to false. This is how the alarm status is toggled based on the current number of executions in progress and finished/failed executions in the time period sampled.
Notifications are sent for each status change to the DataPipelineExecution topic and available to all subscribers.
Pre/Post Execution Backups
States are added before the build process and after the load process to run the backup script using Lambda and the Systems Manager Run Command API. The DocumentName and CommandId are returned to the state machine.
The results are stored under $.backups.pre and $.backups.post in the state machine output.
Build Output Validation
Lambda function using Polars library to run validation checks on the CSV files output in the previous build step for each release. Checks S3 assets at the file level and file contents level. If all checks pass the execution will proceed to the load stage. If any checks fail the input payload will be marked as invalid and the execution will stop.
File checks:
S3 prefix exists and contains objects
The object names under the prefix correspond to the expected artifact names
File contents checks:
The file's timestamp is after the execution start time
The file name is correct
The CSV headers are correct
Rows are present (the CSV is not empty)
Validation details are stored under $.validations.build in the state machine output.
Pre/Post Execution Validation Queries
Lambda function using the Neo4j Python driver to run Cypher statements against Neo4j before and after the execution. The results of these steps are meant to be used for monitoring purposes and will not affect the execution's logic.
Cypher Statements
Node counts are returned for each label in the database.
MATCH (n:GFE) RETURN count(n) as count;
MATCH (n:IPD_Accession) RETURN count(n) as count;
MATCH (n:IPD_Allele) RETURN count(n) as count;
MATCH (n:Sequence) RETURN count(n) as count;
MATCH (n:Feature) RETURN count(n) as count;
MATCH (n:Submitter) RETURN count(n) as count;
Number of distinct release versions stored on the HAS_IPD_ALLELE edge.
MATCH (:GFE)-[r:HAS_IPD_ALLELE]->(:IPD_Allele)
WITH r, apoc.coll.toSet(r.releases) as releases
UNWIND releases as release_version
RETURN DISTINCT release_version, count(release_version) as count
ORDER BY release_version;
Number of distinct release versions stored on the HAS_IPD_ACCESSION edge.
MATCH ()-[r:HAS_IPD_ACCESSION]->() RETURN DISTINCT r.release as release_version, count(r.release) as count;
Validation details are stored under $.validations.queries in the state machine output.
Description
Toggle Automated Backups on/off
When a Step Functions execution is started, the alarm status will be
ALARM
for the duration of the execution until it succeeds, fails, or aborts. When the execution is finished the ALARM will beOK
. When the alarm status becomesALARM
, a notification is sent to SNS which triggers the DisableBackup Lambda function to disable the Neo4j Backup Maintenance Window using the SSM API. When the alarm status changes toOK
another SNS notification is sent and the DisableBackup function re-enables the maintenance window.The logic used by the alarm evaluates several Step Functions execution metrics:
ExecutionsStarted
ExecutionsSucceeded
ExecutionsFailed
ExecutionsAborted
If the expression:
... evaluates to
true
, the alarm status changes toALARM
. It will return toOK
when the expression evaluates tofalse
. This is how the alarm status is toggled based on the current number of executions in progress and finished/failed executions in the time period sampled.Notifications are sent for each status change to the DataPipelineExecution topic and available to all subscribers.
Pre/Post Execution Backups
States are added before the build process and after the load process to run the backup script using Lambda and the Systems Manager Run Command API. The DocumentName and CommandId are returned to the state machine.
The results are stored under
$.backups.pre
and$.backups.post
in the state machine output.Build Output Validation
Lambda function using Polars library to run validation checks on the CSV files output in the previous build step for each release. Checks S3 assets at the file level and file contents level. If all checks pass the execution will proceed to the load stage. If any checks fail the input payload will be marked as invalid and the execution will stop.
File checks:
File contents checks:
Validation details are stored under
$.validations.build
in the state machine output.Pre/Post Execution Validation Queries
Lambda function using the Neo4j Python driver to run Cypher statements against Neo4j before and after the execution. The results of these steps are meant to be used for monitoring purposes and will not affect the execution's logic.
Cypher Statements
Node counts are returned for each label in the database.
Number of distinct release versions stored on the
HAS_IPD_ALLELE
edge.Number of distinct release versions stored on the
HAS_IPD_ACCESSION
edge.Validation details are stored under
$.validations.queries
in the state machine output.State Machine Output