nlpsandbox / nlpsandbox-schemas

OpenAPI specifications of the NLP Sandbox services
https://nlpsandbox.io
Apache License 2.0
2 stars 4 forks source link

Identify date format that are HIPAA PHI #40

Closed tschaffter closed 3 years ago

tschaffter commented 4 years ago

Background

On dates, the HIPAA specification says:

(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

Source

The representation of date format that we use is defined here.

@tschaffter is extending the 2014 i2b2 dataset to include information about the date format of each Date annotation. The reason is because we want to engage developers to be able to predict the format of the date string they detect, which in turn enable to convert a date string programmatically is a standard date object.

We are considering reporting the performance of the date annotators and other annotators for their ability to detect PHI (HIPAA or other standard). Currently our Date detection task is relatively generic and is aimed to be reused for other, more complex NLP tasks that do not necessary require to know if a date string is PHI or not (mainly only relevant for deidentification).

Task

Find a set of regular expression that we can apply to date format to identify if the date string is PHI or not.

tschaffter commented 4 years ago

A central question is whether the date format alone can be used to distinguish whether a date annotation is PHI or not. Maybe the context of the clinical note can help. Practically for the 2014 i2b2 dataset, I don't see us going over all the notes to look at the context, but we could use the date format inferred by @tschaffter to decide if a date string is PHI or not.

tschaffter commented 3 years ago

Update

We have decided to keep this property and make it optional. Following a discussion we had in December, this field could be useful if we need on day to generate anonymize clinical note that maintain the date format from the original clinical note.

tschaffter commented 3 years ago

The current implementation has the property TextDateAnnotation.dateFormat so we are good.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.