ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

anonymise_strings_at_word_boundaries_only config option not used by FlashText #68

Closed martinburchell closed 2 years ago

martinburchell commented 2 years ago

The anonymise_strings_at_word_boundaries_only config option gets passed in to WordList but if the default FlashText processor is used, this setting is ignored.

https://github.com/RudolfCardinal/crate/blob/master/crate_anon/anonymise/scrub.py#L283

If this is intentional perhaps it's worth documenting.

RudolfCardinal commented 2 years ago

Fixed by https://github.com/RudolfCardinal/crate/commit/3967eaa38d7010926ef5dd3605f058d4a8b4b473