Context: Disambiguation of certain keywords (e.g. `$validate-code`)

Summary

There is a keyword $validate-code (perhaps there will be more cases like this later?) that exists under 2 different FHIR resources: CodeSystem and ValueSet. We want to disambiguate.

Possible solutions

Have somewhere (see "Considerations" below) where we store a list of "disambiguation words" for a given keyword. First, we get all the messages for the keyword as before. Then, we filter out all of the messages containing the keyword.

Counts report

Then, for each of these "disambiguation words", we do disambiguation on single messages only.

Threads report

Then, for each of these "disambiguation words", we do NLP on all the messages of the thread.

Questions

How to represent in the report? Perhaps where before we would have the word $validate-code in the keyword column, it would now be 2 and possibly more rows: 2 rows $validate-code (CodeSystem) and $validate-code (ValueSet) (what if one or both "disambiguation words" do not actually appear? Additional rows: $validate-code (or call it $validate-code (all <threads||messages>), $validate-code (no disambiguation keywords found), $validate-code (CodeSystem, ValueSet) when both words found.

Considerations

Should I (a) add all of the logic for this in Python, or (b) add a new column to the google sheet, e.g. disambiguation, which contains a list (e.g. ;-delimited) of disambiguation words (e.g. CodeSystem vs ValueSet).

jhu-bids / fhir-zulip-nlp-analysis