sunitparekh / data-anonymization

Want to use production data for testing, data-anonymization can help you.
MIT License
459 stars 92 forks source link

Support for paging to prevent issue with large tables #16

Closed istipanov closed 12 years ago

istipanov commented 12 years ago

Fetching all records at once becomes a problem with large tables. It would help if the utility provided support for some sort of paging or similar way to work around this issue.

Has someone run into this already? If yes, how did you solve it?

Thanks.

sunitparekh commented 12 years ago

look at the example here on adding the batch_size to the table that needs batch processing. It is mandatory to have primary_key for the batch processing. I am attaching the code snippet for you here. You can give what should be the batch size, give large enough for reducing # of select queries, around 1K to 5K whatever works for you.

  table 'Customer' do
    primary_key 'CustomerId'
    batch_size 5  # batch_size works only if the primary_key is defined for the table

    whitelist 'CustomerId', 'SupportRepId', 'Company'
    anonymize('Phone').using FieldStrategy::RandomPhoneNumber.new
    anonymize('FirstName').using FieldStrategy::RandomFirstName.new
    ....
  end

above fix is currently in the master brach and will be released in v0.5.1