rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

help with dob #90

Closed amz4u2nv closed 2 years ago

amz4u2nv commented 2 years ago

Hi, im trying to create a db strategy yml file, where the dob generated should be between certains dates, cause they need to be 18 or above of age. Any help, would be much appreciated.

stoiven commented 2 years ago

You prolly can do something with a type literal and then use some sql to generate it @amz4u2nv

amz4u2nv commented 2 years ago

Thanks, Is there any example, of using this date_between_dates

stoiven commented 2 years ago

If you're doing >18 age, can you filter via based on the table? Like for example

mysql> select * from person;
+-----+----------+--------+------+
| number | name  | gender | age  |
+-----+----------+--------+------+
| 114    | kev    | m    |  18   |  
| 115    | brian  | m    |  18   | 
+-----+----------+--------+------+

select * from table_name where timestampdiff(year,age,curdate()) > 50

Or something like this

amz4u2nv commented 2 years ago

My bad at not explaining it, I’m trying to generate a fake dob but need the dob to be set so that they are all over 18

rwnx commented 2 years ago

you should be able to use the date_between provider in the way you mentioned, using the fake_args feature.

column_name: 
  type: fake_update
  fake_type: date_between
  fake_args:
    start_date: "-50y"
    end_date: "-18y"

You can read more about how passing args to generators works in the docs.

Let me know how you get on ✨

amz4u2nv commented 2 years ago

Thanks @rwnx, that did the trick I was trying to use date_between_dates - like below

  date_of_birth:
    type: fake_update
    fake_type: date_between_dates
    fake_args:
      start_date: '1975-01-01'
      end_date: '2001-01-01'

but got this error - Unsupported Fake type: date_between_dates: dict_keys(['start_date', 'end_date']) Not sure why this didnt work, do you know where the best place to see if my keys are correct and if the date inputs are in the correct format.

Thanks

rwnx commented 2 years ago

So a Unsupported Fake type is thrown when the generator, date_between_dates can't be found in the faker object being used. can you give me some more info, e.g. your pynonymizer version, the version of faker you have, the locale being used in the strategy/config?

amz4u2nv commented 2 years ago

the version of pynonymizer is 1.21.3 faker 11.3.0 I didn't choose explicity a locale but tried it with and without pynonymizer -u root -p root -i test1.sql -s table_strategy.yml -o testdumpanon.sql

tables: applicant_applicant: columns: firstName: first_name lastName: last_name dateOfBirth: type: fake_update fake_type: date_between_dates fake_args: start_date: '1975-01-01' end_date: '2003-01-01'

Also taking out the fake_args in the above example, doesn't show any error, but it requires the params to make it work correctly.

rwnx commented 2 years ago

Ahh, I think i have something for you. date_between_dates and date_between take different kwarg names 😅 (date_start vs start_date)

⚠️ However. I don't think feeding date_between_dates strings will work either, unless the yaml parser supports datetime parsing natively. Can i ask why you want to use date_between_dates / Did you try using date_between? It seems like they'd have the same effect (pynonymizer will be using a string value in the insert/update statement anyway).

amz4u2nv commented 2 years ago

yup that did the trick...my bad, should have spotted that. date_between work absolutely fine, it was just when i trying to figure out how to resolve it, I came across date_between_dates, and was just wandering why it didn't work for me.

Thanks for all your help, much appreciated.