Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
Support comparisons with missing DOBs in full. (Probands with no DOB are compared to all candidates. Candidates with no DOB are compared to all probands.)
OrderedSet used in the creation of shortlists, rather than set, to guarantee order relative to the input (for the first-of-a-tie-wins log odds comparison system).
Also:
The --extra_validation_output option now includes the ID of the second best candidate (not just the second-best log odds), so it's easier to see the effects of changes.
Updates to Bayesian fuzzy linkage system.
Main things are:
OrderedSet
used in the creation of shortlists, rather thanset
, to guarantee order relative to the input (for the first-of-a-tie-wins log odds comparison system).Also:
--extra_validation_output
option now includes the ID of the second best candidate (not just the second-best log odds), so it's easier to see the effects of changes.