Closed tschaffter closed 3 years ago
A central question is whether the date format alone can be used to distinguish whether a date annotation is PHI or not. Maybe the context of the clinical note can help. Practically for the 2014 i2b2 dataset, I don't see us going over all the notes to look at the context, but we could use the date format inferred by @tschaffter to decide if a date string is PHI or not.
We have decided to keep this property and make it optional. Following a discussion we had in December, this field could be useful if we need on day to generate anonymize clinical note that maintain the date format from the original clinical note.
The current implementation has the property TextDateAnnotation.dateFormat so we are good.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Background
On dates, the HIPAA specification says:
Source
The representation of date format that we use is defined here.
@tschaffter is extending the 2014 i2b2 dataset to include information about the date format of each Date annotation. The reason is because we want to engage developers to be able to predict the format of the date string they detect, which in turn enable to convert a date string programmatically is a standard date object.
We are considering reporting the performance of the date annotators and other annotators for their ability to detect PHI (HIPAA or other standard). Currently our Date detection task is relatively generic and is aimed to be reused for other, more complex NLP tasks that do not necessary require to know if a date string is PHI or not (mainly only relevant for deidentification).
Task
Find a set of regular expression that we can apply to date format to identify if the date string is PHI or not.