healtex / texscrubber

Personal information de-identification tool
Apache License 2.0
2 stars 2 forks source link

Implement patient ID / document ID parsing in FlatFileSingleItemReader #21

Open hkkenneth opened 7 years ago

hkkenneth commented 7 years ago

Update

                doc.setPerPersonDocumentId("placeholder");
                doc.setPersonId("placeholder");
                // TODO: parsing of person ID from file name

Currently, no customization. Just assume the file name is patID-docID.txt

mbelousov commented 7 years ago

We need to implement a simple rule: If no file pattern is given, threat each file as a separate patient file. Also, use patient subfolders (i.e. {pat-id}/file.txt) instead of {pat-id}-{doc-id} might be a better option.