NLP docfix and tidy up regex autotesting

ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.

GNU General Public License v3.0

19 stars 7 forks source link

NLP docfix and tidy up regex autotesting #74

Closed RudolfCardinal closed 2 years ago

RudolfCardinal commented 2 years ago

Improve docs for NLP, including about how to discover names of remote (cloud) NLP processors, and descriptions/formatting of local Python ones.
Remove "crate_nlp --showinfo" option re NLP processors (confusing excessive detail and could only be used one by one; "--describeprocessors" is better).
Move unit tests for NLP regexes into their own files and use Pytest framework consistently.
Some slight tweaks to improve recognition of some units.

martinburchell commented 2 years ago

Looks good. I've run Black on this branch as well