nhsuk / sexual-health-service-data-combiner

DEPRECATED - no longer actively maintained
MIT License
0 stars 1 forks source link

DEPRECATED - no longer actively maintained


Sexual Health Service Data Combiner

GitHub Release Greenkeeper badge Build Status Coverage Status

Generates a single JSON file from multiple sources. The resulting file will be used by an instance of elasticsearch-updater to load into Elasticsearch.

The data sources

The data sources are listed in config.js. Each data source has two properties, the filename and the url (to an accessible JSON file containing the data). The file must be a JSON array of objects. Each file will be concatenated (along with some additional processing to ensure integrity) to produce a merged data set (sexual-health-service-data-merged.json). This will be uploaded into the Azure Storage account specified in AZURE_STORAGE_CONNECTION_STRING. The files will be uploaded into a container as specified in AZURE_BLOB_CONTAINER_NAME or etl-ouput, if using the default. For example, if the storage account is primarycare and the defaults are used, the merged data set will be available to download from https://primarycare.blob.core.windows.net/etl-output/sexual-health-service-data-merged.json.

Running the application and scheduling

The application will run at startup and then on a daily basis, while the container continues to run. The time of day defaults to 7:15am, and can be changed via the UPDATE_SCHEDULE environment variable. Further details on the time format are available at here

The scheduler can be completely disabled by setting the DISABLE_SCHEDULER variable to true. This sets the run date to run once in the future on Jan 1st, 2100.

Environment variables

Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.

Variable Description Default Required
AZURE_BLOB_CONTAINER_NAME Azure storage container name etl-output
AZURE_STORAGE_CONNECTION_STRING Azure storage connection string yes
AZURE_TIMEOUT_MINUTES Timeout in minutes before file upload errors 5
DISABLE_SCHEDULER Set to 'true' to disable the scheduler false
LOG_LEVEL log level Depends on NODE_ENV
NODE_ENV Node environment development
UPDATE_SCHEDULE Time of day to run the upgrade 15 7 * * * (7:15 am)

Architecture Decision Records

This repo uses Architecture Decision Records to record architectural decisions for this project. They are stored in doc/adr.