Make the rule-based pipeline end-to-end

Previously the raw DCC texts were preprocessed in the emc-dcc-preprocessing repo using a spacy pipeline. Then that pipeline and the preprocessed texts had to be loaded in here.

Now wrote a separate script to make the pipeline (build-pipeline.py), starting from tokenisation up to and including the context algorithm. That pipeline can thus be run on the raw text.

I also rewrote the context.ipynb notebook to use this new pipeline. This shouldn't have impacted the actual predictions, and I confirmed that the performance scores indeed did not change.

Closes #5.

umcu / negation-detection

Make the rule-based pipeline end-to-end #33