ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Autodetecting output structure from cloud NLP? #91

Open RudolfCardinal opened 2 years ago

RudolfCardinal commented 2 years ago

When running a "cloud" (remote) natural language processing (NLP) tool, the NLPRP protocol allows the remote processor to explain its table structure to the caller (https://crateanon.readthedocs.io/en/latest/nlp/nlprp.html#list-processors). However, this is optional, and in our setup, some GATE processors don't do this. Would it be sensible to try to autodiscover this, with a combination of asking for user input and scanning the results? It would be imperfect. The main challenges are GATE processors, which can produce several types of response and are not (at the server end) tied to SQL. It's a bit of a pain to configure and maybe it would be a significant ask to ask the source team to define the tabular structure. For an example (@martinburchell) try the r14_nlp_test_clour_nlp.bat script in the CPFT setup, and enter some text about having a nightmare (it's currently set up for the SLaM "nightmare" detector). I'm not sure if this is a good idea!