suzytamang / clever-rockies

clinical event recognizer for TIU
3 stars 1 forks source link

Terms at beginning or end of a snippet are cut off #104

Open vilijajoyce opened 1 month ago

vilijajoyce commented 1 month ago

Original Request

I just noticed an issue with snippets returned by the NLP pipeline where the tagged term gets cut from the snippet. This happens when the term is right at the beginning or the end of a snippet. Take a look at the following snippets for file date 2024-10-08:

  • PPAIN-361873: term is 'bereavement'. Note starts as "Bereavement: Session 5: Finding Your Guides ...", but the snippet starts as "Session 5: ..."
  • PPAIN-367227: term is 'insomnia' but snippet ends with "... patient complains of:" instead of "... patient complains of: Insomnia".

If the tagged term is a phrase, only part of the term makes it into the snippet. For example:

  • PPAIN-362292: term is 'doing good'. Note starts as "Doing good, so you look ...", but the snippet starts as "good, so you look ...".
  • PPAIN-348703: term is 'in good spirits' but snippet ends with "... in good", instead of "... in good spirits".

It happens across concepts. I just happened to work with PPAIN. Just for 2024-10-08 I'm counting 1498 for HOUSING, 804 for CGS, and >237 for FALL, to name a few.

I will write test cases to check for this issue.

Action Requested TBD

Requestor 3ST Team (Esther Meerwijk)

Additional Actions

dax-westerman commented 1 month ago

A question, please, @vilijajoyce? What's the lift/dependencies involved in the following:

Foremost, I want to get an understanding of how hard it is to established success criteria for these cases. I'm thinking if we can move some part of the "review" into the development cycle with a sample of expected outcome, then we can shorted the overall dev cycle, improve accuracy, and potentially start building unit test cases using non-pii/phi data.