Closed babariviere closed 3 years ago
I have a couple of things in the implementation to discuss:
- I want to be sure of the use-case. Can you think of any other cases(like this one) where you'd want a value from faker that comes out as a string, but you store it differently in a database? UUID is an easy one, but for others i'm coming up a bit short. I'd like to put these in the documentation so everyone understands what the new feature is all about!
I am not really sure for the other use cases. I haven't tested it but maybe these types can collide:
Also, as the type is chosen by pynonymizer, the user is not able to create BIGINT
for example.
But that depends on the use cases and I am not able to find others apart from mine. 😅
Maybe we could modify the
_get_column_subquery
to use sql_type to cast the text field to something else?Something along the lines of:
elif column_strategy.strategy_type == UpdateColumnStrategyTypes.FAKE_UPDATE: return f"( SELECT {column_strategy.qualifier}::{column_strategy.sql_type} FROM {seed_table_name} WHERE \"updatetarget\"=\"updatetarget\" ORDER BY RANDOM() LIMIT 1)"
Yeah sure, I will update my PR then 😄
Also, as the type is chosen by pynonymizer, the user is not able to create BIGINT for example.
This is legit. What i've tried to do in the seed table is pick types that meet or exceed what faker/python gives us, and then worry about the update statement - I think in the case of bigint we could just make that the default and then have it cast "down".
In the case of complex types like geometric/json, whos equivalent would be something like a class or a dict... I'm not sure, in short. I don't want to add a big serialization concern to the anonymization part just to support them, when the literal
type and this feature can get you most of the way there.
I've added on: pull_request
to the workflow on master, btw, I'm not sure if that will take effect on next commit or what. I really need to see the tests pinging off since there have been a few things here . You might need to merge/rebase from upstream.
hey @babariviere, I think there might have been a misunderstanding regarding what i was asking for in the last review. I think that testing for the casting behaviour working is a good idea(!) I don't want to revert the tests you've added - I think that the fixtures being used specifically should have get_data_type
return FakeDataType.STRING
instead of the string 'UUID'
.
Other than that It's looking great. I can't wait to merge it! ✨
@jerometwell Oh sorry ! I will put it back then ! 😄
New feature
Add a new field to the following column strategies:
It allows user to define it's own SQL type. This is to override a limitation of types.
Use case
With the following database schema in PostgreSQL:
If I write the following strategy file:
We got the following error:
Solution
To overcome this issue, we could fix the strategy file by writing this:
In the seed table, we will have the correct data type to update
foo
table.