The objective of this project is to collect, organize and mine information from scientific papers discussing the Zika virus. This is an open science project. This means everybody is welcome (and encouraged) to contribute however they can, using whichever skills they have. Your contributions to the project will be recorded as comments in the issue tracker and pull requests to our repo. We will use those to automatically build a list of participants to give credit to everyone that helped.
You can choose to help with whatever you're comfortable with:
Many of us can only work on this project in spare time, so we will take baby steps. The first MVP we are targetting is the collection and annotation of all scientific articles discussing Zika. First we will automatically extract entities and concepts mentioned in those articles. Subsequently, we will curate the extracted data to remove unrelated content and include missing entities/concepts.
If you want to run/develop with us at this stage of the project, we assume you are running a Linux distribution (or at least MAC OS with add-ons). Let us know if that's not the case.
make install # to install the prereqs
make all K=10 # to see the code running through the whole workflow with only 10 documents at first
make download # to download all of the content (900 abstracts)
make annotate # to run the workflow over the currently downloaded documents
I describe below a few design decisions to help you understand how we got where we got, but we're also open to constructive criticism and suggestions on how we can improve our project.
Strong universal identifiers:
Reproducibility
Using a GNU Make as workflow manager:
simple/12345.json
-- running make simple/12345.json
should call the right scripts with the right inputs to generate that file;make
again should pick up where it left;make -j <number of threads>
.Github issue tracker to coordinate development, issues, bugs, etc.
Pull requests with diff patches to provide reproducible fixes to the data