Closed martinburchell closed 4 months ago
It seems a bit over-conservative on SQLAlchemy's part. Presumably it reflects the column type from the database before this point? The conservatism includes e.g.
SELECT * FROM crp WHERE _pk IN (1, 2, 'hello');
; it also fetches the record with SELECT * FROM crp WHERE _pk IN ('1');
and does the conversion automatically.There's some utility in being able to specify the opt-outs quite broadly in the config file, I think, as for the example you suggest with multiple tables operating in different ways -- although maybe it's not huge. Is it possible to disable the SQLAlchemy check for this specific command?
In SQLAlchemy, I think this is in sql/sqltypes.py
, specifically in class Boolean
. So is that from reflection?
The rationale behind this behaviour is explained in https://docs.sqlalchemy.org/en/14/changelog/migration_12.html#boolean-datatype-now-enforces-strict-true-false-none-values.
To disable the checks we would need something like the LiberalBoolean TypeDecorator
example in that note. Alternatively if there is a risk that '0' may be interpreted differently by different database backends then maybe we should just filter out any string values from optout_col_values
when the column is boolean.
If
optout_col_values = [True, 1, '1']
in the anonymisation configuration, and the opt-out field is a boolean field, when executing the following:SELECT DISTINCT crate_rio_number FROM rio_manual_opt_out WHERE opt_out IN (True, 1, '1')
SQL Alchemy will raise a TypeError when checking if the string values in the IN clause could be considered boolean.
There's a failing test on the
fix-optout-col-values-typeerror
branch, which despite its name doesn't actually fix anything:Possible options:
optout_col_values
are considered for boolean fieldsI suppose it's conceivable that there may be a boolean opt-out field in one table, where True or 1 means opt-out and a string opt-out field in another table, where '1' or 'yes' means opt-out.
@RudolfCardinal any thoughts?