Closed armorKing11 closed 2 years ago
Hi, here are some things i might be able to answer for you:
pynonymizer is only really a fancy harness for running sql statements. that error is coming from the database - pynonymizer does no 'check'
the fake_update
strategy does not provide unique values.
There is some more discussion on 'uniqueness' in this issue: https://github.com/rwnx/pynonymizer/issues/86
For uniqueness, you have two options:
use a unique_login column strategy (a random string) https://github.com/rwnx/pynonymizer/blob/master/doc/strategyfiles.md#column-strategy-unique_login
design your own random string using the appropriate literal for your db. e.g. in your example: (select CONCAT('test', RAND()) )
https://github.com/rwnx/pynonymizer/blob/master/doc/strategyfiles.md#column-strategy-literal
https://stackoverflow.com/questions/16737910/generating-a-random-unique-8-character-string-using-mysql
Thank you for the options @rwnx . Had 3 questions from the above options:
As per Alex's comment on seed rows , ie, setting the value of seed_rows
to a larger value , what is the default value for seed_rows
and for say a faker generator like word
, would an increase in seed_rows
prevent collisions ?
https://github.com/rwnx/pynonymizer/issues/86#issuecomment-920967092
When alex grover mentions about the random
literal in below comment is it in reference to ( RAND() )
and if so are the random numeric values like say 0.43233
generated by RAND()
converted to string before being written to the specified column field
https://github.com/rwnx/pynonymizer/issues/86#issuecomment-921076192
When generating a random string using your example (select CONCAT('test', RAND()) )
, how can i control the number of characters of the string ,ie, in the above case when using RAND()
, how can i round the number of decimal places ?
fake_update
type was not built for this use-case.literal
update strategy. this strategy just replaces the value part of an update statement. [...] so you must ensure that it is correct and executable by your provider
There is no "checking", or conversion or any other interaction by pynonymizer. the value is pasted in verbatim. If you need a string, you need to make sure a string is being outputted by the statement.
e.g.
tables:
system:
columns:
sys_id: ( 'test1' )
other_column: ( RAND() )
# this turns into this update sql:
# UPDATE system SET sys_id = ( 'test1' ), SET other_column = ( RAND() )
the values you put here have to be parsable and executable by your database provider - in your case, mysql.
I linked you a stackoverflow post in my last reply that had a lot of discussion on "Generating a random & unique 8 character string using MySQL", but the examples in that thread could easily be customized for your exact needs. This isn't really a pynonymizer thing, but instead a general sql problem.
Sometimes i use http://sqlfiddle.com/ to try out potential values. Hope that helps ✨
Thank you very much for your guidance @rwnx , appreciate it !!
I'm closing this issue now as it seems that we've resolved the uncertainty here.
Is your feature request related to a problem? Please describe. Currently when you try to anonymize a column say (sys_id) of type VARCHAR and if it has duplicate foreign keys , pynonymizer throws error
where for api_product_id = 0yb6h5jfs5d , api_product_id the foreign key and for sys_id = test1,
sys_id
is the column to be updatedwhen using a strategy file as below
How can i specify in the strategy file to generate random string values without using the faker lbrary generators ? The faker library does not guarantee unique values cos when i tried with say
word
faker generator , it still gave me an error and failedAnd for a large number of rows ,the faker library generators adds an overhead (since it needs to generate the value and insert it )slowing down the anonymization process . I know that there is a RAND() function , but that generates a value from 0 to 1 . Can that be used to say generate a value range lke
or
And why does pynonymizer check the combination of foreign key and the column to be updated instead of the primary key and the column to be updated ?
Describe the solution you'd like
Have either a way to combine a string value and the RAND() function possibly ,like
( 'test' + RAND() )
or a random string generator where you could specify the length of the string like( RAND_STR(5) )
where 5 is the length of the stringDescribe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.