umcu / clinlp

A Python library for performing NLP on clinical text written in Dutch
GNU General Public License v3.0
33 stars 0 forks source link

Improve tokenizer handling of special characters #70

Open vmenger opened 2 months ago

vmenger commented 2 months ago

The current tokenizer has some basic logic for handling special characters, but could use some improvements. Please add examples below in case someone wants to pick this up.