ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

PersonalizedScrubber defaults appear to be wrong #67

Closed martinburchell closed 2 years ago

martinburchell commented 2 years ago

https://github.com/RudolfCardinal/crate/blob/master/crate_anon/anonymise/scrub.py#L422

    anonymise_numbers_at_word_boundaries_only: bool = True,
    anonymise_numbers_at_numeric_boundaries_only: bool = True,

I think one of these should default to False because if anonymise_numbers_at_word_boundaries_only is True, anonymise_numbers_at_numeric_boundaries_only is ignored.

The defaults are different for get_code_regex_elements() (word True, numeric False) . The config file defaults are reversed (word False, numeric True)

RudolfCardinal commented 2 years ago

Fixed (I think) by https://github.com/RudolfCardinal/crate/commit/3967eaa38d7010926ef5dd3605f058d4a8b4b473