suzytamang / clever-rockies

clinical event recognizer for TIU
2 stars 2 forks source link

Non-3ST duplicate found in dict.txt #24

Open dax-westerman opened 1 week ago

dax-westerman commented 1 week ago

As part of issue #19, I am including a basic validation framework for the dict.txt file. Once validation check I've included is a check for duplicate entries for the combination of term string, class name, and subclass name. I found one duplicate, though this appears to be outside 3ST

Note: ignore the columns is_valid_term and has_duplciate_id as well as my own naming conventions in the dataframe. These are for convenience, and they will not be used in the dict.txt file.

image

While this does not appear to be within the 3ST scope, I wanted to document the duplicate in order to determine the appropriate means of reporting/documenting such in future.

@vilijajoyce, do you have any insight as to who should be notified of this, whether to determine if it's in fact a duplicate and, if so, how it should be handled, please?