ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Fix TypeError for boolean opt-out fields when optout_col_values contains non-boolean values #142

Closed martinburchell closed 4 months ago

martinburchell commented 5 months ago

Fixes #140 so that for boolean opt-out fields, non-boolean values are dropped from optout_col_values in the anonymisation config file.

@RudolfCardinal A side-effect of this is that if there are no valid values for optout_col_values, all patient ids will be considered as opted out. Is this OK?

In adding some tests, I've also added support for database-level testing from pytest, initially with SQLite and MySQL. It may well work if you pass a non-MySQL url to pytest but I've only tested with MySQL. This will change when I add this functionality to #141 and we support more backends.

I've also changed the names of some other test classes, which were previously being ignored by pytest.

martinburchell commented 4 months ago

@RudolfCardinal as discussed, the script now aborts with ValueError (as with the other configuration checks) if there are no valid values for optout_col_values. I suggest we merge in order #143, #141 before this one (with any fixups) once they have your approval.