sunitparekh / data-anonymization

Want to use production data for testing, data-anonymization can help you.
MIT License
459 stars 92 forks source link

Bulk table updates #53

Open abrom opened 6 years ago

abrom commented 6 years ago

Not an issue per se.. I've been adding a bulk table update method to a fork of your project and thought you might be interested. Relatively simplistic at the moment but the general gist is

bulk_table 'my_table' do
  where "some_column != 'some value'"
  anonymize('pii_column') { 'xxxxxxxx' }
end

Seeing as the where filter is passed straight through to AR, it can be a hash or could include a subquery filter. The anonymisation currently just passes a random string through to the strategy but it could be made a bit smarter, looking at column type etc - for my purpose I'm just using the Anonymous strategy with a block as per above.

Another thought might be to simplify things even further by passing the query itself through as a param. Something like:

bulk_table ... do
  with_query do |query|
    query.
      joins('join other_table... ')
      where(other_table: { value: 'bar' })
  end
end

Not sure if there is a nice way to do cross connection copies, other than to dump and load. Seemed a bit crazy to do that in memory (and also didn't fit my use case), so for now only supports anonymising the source DB:

https://github.com/Studiosity/data-anonymization/commit/cdfcfecb8b2cc129fa1f6933e58b9260bb765820

sunitparekh commented 6 years ago

I am working on porting this tool to Java/Kotlin for better performance. If you want to give it a try for early version you can find it here... https://github.com/dataanon/data-anon Sample project https://github.com/dataanon/dataanon-kotlin-sample