umcu / clinlp

A Python library for performing NLP on clinical text written in Dutch
GNU General Public License v3.0
34 stars 0 forks source link

Term proximity matching with exclusive list of allowable terms #96

Open bramiozo opened 5 months ago

bramiozo commented 5 months ago

The proximity matching is very useful in case the expression between START and END tokens are specific in scope. It might occur in practice that there are multiple types of expression between START and END tokens. To facilitate a specific scope it is useful to have an exclusive list of allowable terms for the proximity list.

vmenger commented 5 months ago

Not totally sure if I get this one, have an example?

bramiozo commented 5 months ago

Suppose, I am interested in:

No, red, green or blue sheep and similar statements, indicating, the absence of any type of sheep.

So, I use

sheep_term = Term("No sheep", proximity=K, fuzzy=L..)

But now No, there are plenty of sheep is also flagged by the matcher.

In practice this might be limited to e.g .specific descriptors, plus some connecting words.