Wellcome Reach is an open source service for discovering how research publications are cited in global policy documents, including those produced by policy organizations such as the WHO, MSF, and the UK government. Key parts of it include:
Wellcome Reach is written in Python and developed using docker-compose. It's deployed into Kubernetes.
Although parts of the Wellcome Reach have been in use at Wellcome since mid-2018, the project has only been open source since March 2019. Given these early days, please be patient as various parts of it are made accessible to external users. All issues and pull requests are welcome. Contributing guidelines can be found in CONTRIBUTING.md.
To develop for this project, you will need:
pip
and virtualenv
To bring up the development environment using docker:
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
)make docker-build
docker-compose up -d
docker-compose ps
Once up, you'll be able to access:
For local development outside of airflow or other services, use the project's virtualenv:
make virtualenv
source build/virtualenv/bin/activate
To run all tests for the project using the official Python version and other dependencies, run:
make docker-test
You can also run tests locally using the project's virtualenv, with
make test
or using the appropriate pytest command, as documented in Makefile
.
Wellcome Reach uses Apache Airflow to automate running its data pipelines. Specifically, we've broken down the batch pipeline into a series of dependent steps, all part of a Directed Acyclic Graph (DAG).
It's quite common to want to run a single task in Airflow without having to click through in the UI, not least because all logging messages are then on the console. To do this, from top of the project directory:
docker-compose
as shown above, andDAG_NAME
, TASK_NAME
, and
JSON_PARAMS
:
./docker_exec.sh airflow test \
${DAG_NAME} ${TASK_NAME} \
2018-11-02 -tp '${JSON_PARAMS}'
Although not required, you can add Sentry reporting from your local dev environment to a localdev project inside Wellcome's sentry account by running:
```
eval $(./export_wellcome_env.py)
```
before running docker-compose up -d
above.
For production, a typical deployment uses:
The evaluation results are stored as an output here. Broadly the evaluation works by comparing a gold set of results - a manually annotated dataset of all the publications that should be found in a sample of policy documents, against the publications Reach identified in the same sample of policy documents. The evaluation script is held in another private repo.
See the Contributing guidelines