jhu-bids / fhir-zulip-nlp-analysis

Ad hoc NLP (Natural Language Processing) analysis of HL7 FHIR's online Zulip chat streams.
MIT License
0 stars 0 forks source link

Keyword context: spelling variations #29

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Overview

E.g. what if CodeSystem happens to be spelled Code System.

Possible solutions: i. Input and outputs should include a new context_spelling column. How to format that column? Currently, context is a ;-delimited list. So should we (a) make this a ;-delimited list, and add another delimiter for spelling variations? e.g. context=CodeSystem;ValueSet, context_spelling=Code System|Code Sys;. E.g. this would show here that the first item has a spelling variation we want to track, and is a |-delimited list, but nothing after the ; indicates that we don't need a spelling variation on ValueSet, (b) we could include variations in the actual context column, e.g. CodeSystem|Code System|Code Sys;ValueSet, (c) a simplification of b where we just include all spelling variations of each context in the same list, e.g. CodeSystem;Code System;Code Sys;ValueSet, (d) JSON in context_spelling column, e.g. {"CodeSystem": ["Code System", "Code Sys"]}, (e) tidy-ify our keywords GoogleSheet. That is, keywords would have multiple rows; e.g. we could have a row for every keyword / keyword spelling / context / context spelling, or something like that. Could wind up with a lot of rows, though.

Additional info

This is an expansion of the work done in #11