rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
101 stars 38 forks source link

Error on executing command caused by 'ANSI_WARNINGS' for mssql database #141

Closed geek-monkey closed 4 months ago

geek-monkey commented 5 months ago

Describe the bug I'am executing the command directly from my local database server (mssql) and I'am working on a database that have multiple indexes. When I execute the command I get this error

UPDATE failed because the following SET options have incorrect settings: 'ANSI_WARNINGS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

I saw that the error comes from this line that turns the ANSI_WARNING off and on https://github.com/rwnx/pynonymizer/blob/main/pynonymizer/database/mssql/__init__.py#L373C34-L373C102

I removed the behaviour directly from the local library directory, I left only UPDATE {}[{}] SET {}{}; and the command passed successfully

Any ideas on how to resolve the problem or is there a way to add an option to the command to not set ANSI_WARNING ?

rwnx commented 5 months ago

# Disable ANSI_WARNINGS to allow oversized fake data to be truncated without error what's your take on the comment above? do you have an idea of whether or not we could remove it entirely?

more command options are good, but I'm worried about more complexity!

geek-monkey commented 5 months ago

I'am no expert in the matter but after reviewing some discussions on whether it's a good idea to turn off ANSI_WARNINGS, my conclusion is that if you do so you risk setting data that is not compatible with the validations that are configured in your columns. For example if you have turned off ANSI_WARNINGS and you try to set a column with a string that have more than 50 characters while your column type is varchar(50) you rist of not getting an error message.

For me the goal is having an anonymized database that is usable for a developpement environment so I need the anonymized values to correspond to a "real life hydrated database", which means respecting the restrictions and validations that are configured on the tables. I think that you could remove disabling ANSI_WARNINGS entirely. We should adapt to our columns validations with the stategy file if needed. Plus in cases where the person need to turn off ANSI_WARNINGS, they can do so directly from the database server.

rwnx commented 5 months ago

this seems reasonable to me. you could do it using the before and after scripts in the strategy file.

rwnx commented 5 months ago

Hi, I found the original reason. it's because without it, TRUNCATE doesn't work.

edit: i misread this. it's column truncation that doesnt work.

from [1.18.0] 2021-04-11:

  • Changed MSSQL connection behaviour: provider will now allow truncation using ANSI_WARNINGS off; when updating tables
rwnx commented 5 months ago

This feature will be added in the v2.0.0 release this week!

geek-monkey commented 5 months ago

Hi, I found the original reason. it's because without it, TRUNCATE doesn't work.

from [1.18.0] 2021-04-11:

  • Changed MSSQL connection behaviour: provider will now allow truncation using ANSI_WARNINGS off; when updating tables

Hello, I see. I'am guessing it depends on whether the database has indexes or not. Thanks for the info !

rwnx commented 5 months ago

Hi @geek-monkey, I tried to work this into the next release but i'm going to have to drop it from v2 as I havent been able to get a working solution.

This feature was added to cover little incompatibilities between the new data and the columns, e.g. column text too long. It makes it much easier to generate new data in a similar way to other providers.

Without it, you get frequent "String or binary data would be truncated" errors when the faker data is too long, which happens a lot!

If we can come up with a solution that covers both use cases I'm happy to talk about this for v2.1. for now I'm going to keep investigating.

rwnx commented 4 months ago

I've added this to the v2.1 milestone, which will be released end of next week.

I'm going to add a flag --mssql-ansi-warnings to control this. if it's still no good we can visit in a future iteration!