umcu / negation-detection

Negation detection in Dutch clinical text.
GNU General Public License v3.0
3 stars 0 forks source link

Make the rule-based pipeline end-to-end #33

Closed lcreteig closed 2 years ago

lcreteig commented 2 years ago

Previously the raw DCC texts were preprocessed in the emc-dcc-preprocessing repo using a spacy pipeline. Then that pipeline and the preprocessed texts had to be loaded in here.

Now wrote a separate script to make the pipeline (build-pipeline.py), starting from tokenisation up to and including the context algorithm. That pipeline can thus be run on the raw text.

I also rewrote the context.ipynb notebook to use this new pipeline. This shouldn't have impacted the actual predictions, and I confirmed that the performance scores indeed did not change.

Closes #5.