sunitparekh / data-anonymization

Want to use production data for testing, data-anonymization can help you.
MIT License
459 stars 92 forks source link

MongoDB anonymization case-sensitivity #25

Closed wpodgorski closed 9 years ago

wpodgorski commented 9 years ago

Hi, I have recently used data-anonymization to anonymize some data in MongoDB in project I am currently working on. Unfortunately I have discovered a bug. Since all the fields in all of my collections are in upper-case (just like in relational databases), I have wrote some anonymize methods with lower-case field names counting that the script will handle this. I have run the script and nothing happend. Afterwards, I read that queries on MongoDB are case sensitive, so I have changed anonymize invocations to upper-case, just like the fields names. Alas, that did not worked as well. I have tried other approaches, but nothing worked. My guess was that anonymize method is written in such way that it decapitalizes its input making it virtually impossible to anonymize fields that are not in lower-case. I had to write ugly js script to decapitalize filed names in collections, run anonymization and then capitlize field names once again.

Here is the example of a bug:

require 'data-anonymization'

require 'mongo'
Mongo::Connection.from_uri("mongodb://localhost/test").drop_database('test')
system "mongoimport -d test --drop -c users --jsonArray ./users.json"

DataAnon::Utils::Logging.logger.level = Logger::INFO

database 'test' do
  strategy DataAnon::Strategy::MongoDB::Blacklist

  source_db :mongodb_uri => "mongodb://localhost/test", :database => 'test'

  collection 'users' do
    anonymize('DATE_OF_BIRTH').using FieldStrategy::TimeDelta.new(5,30)
    anonymize('USER_ID').using FieldStrategy::StringTemplate.new('user-#{row_number}')
    anonymize('EMAIL').using FieldStrategy::RandomMailinatorEmail.new
    anonymize('PASSWORD') { |field| "password" }
    anonymize('FIRST_NAME').using FieldStrategy::RandomFirstName.new
    anonymize('LAST_NAME').using FieldStrategy::RandomLastName.new

    anonymize('Date_Of_Birth').using FieldStrategy::TimeDelta.new(5,30)
    anonymize('User_Id').using FieldStrategy::StringTemplate.new('user-#{row_number}')
    anonymize('Email').using FieldStrategy::RandomMailinatorEmail.new
    anonymize('Password') { |field| "password" }
    anonymize('First_Name').using FieldStrategy::RandomFirstName.new
    anonymize('Last_Name').using FieldStrategy::RandomLastName.new

    anonymize('date_of_birth').using FieldStrategy::TimeDelta.new(5,30)
    anonymize('user_id').using FieldStrategy::StringTemplate.new('user-#{row_number}')
    anonymize('email').using FieldStrategy::RandomMailinatorEmail.new
    anonymize('password') { |field| "password" }
    anonymize('first_name').using FieldStrategy::RandomFirstName.new
    anonymize('last_name').using FieldStrategy::RandomLastName.new
  end
end
[
    {
        "USER_ID": "sunitparekh",
        "DATE_OF_BIRTH": { "$date":1346740765000 },
        "EMAIL":"parekh.sunit@gmail.com",
        "PASSWORD":"TfqIK8Pd8GlbMDFZCX4l/5EtnOkfLCeynOL85tJQuxum&382knaflk@@",
        "FAILED_ATTEMPTS":0,
        "FIRST_NAME":"Sunit",
        "LAST_NAME":"Parekh",
        "PASSWORD_RESET_ANSWER":"manza",
        "PASSWORD_RESET_QUESTION":"My new car modal?",
        "NICK_NAMES" : ["sUnit","Mr S", "Parekh"],
        "UPDATED_AT":{ "$date":1346740767000 }
    },
    {
        "User_Id": "satyamag",
        "Date_Of_Birth":{ "$date":1346740765000 },
        "Email":"satyamag@gmail.com",
        "Password":"$2a$10$2YTfqIK8Pd8GlbMDFZCvGOcJYLkQs7Hlpal4YF99iSh9yhnWPggZG",
        "Failed_Attempts":1,
        "First_Name":"Satyam",
        "Last_Name":"Agarwal",
        "Password_Reset_Answer":"iphone",
        "Password_Reset_Question":"My phone?",
        "Updated_At":{ "$date":1346740767000 }
    },
    {
        "user_id": "anandagrawal",
        "date_of_birth":{ "$date":1346740765000 },
        "email":"anandagrawal84@gmail.com",
        "password":"Tz548O0RWusldVAWkwqfzO3jK/X4l/5EtnOkfLCeynOL85tJQuxum",
        "failed_attempts":0,
        "first_name":"Anand",
        "last_name":"Agrawal",
        "password_reset_answer":"android",
        "password_reset_question":"My phone?",
        "updated_at":{ "$date":1346740767000 }
    }
]
{ 
    "_id" : ObjectId("54fd7a32c39c4f3ea38968aa"), 
    "USER_ID" : "sunitparekh", 
    "DATE_OF_BIRTH" : ISODate("2012-09-04T06:39:25Z"), 
    "EMAIL" : "parekh.sunit@gmail.com", 
    "PASSWORD" : "TfqIK8Pd8GlbMDFZCX4l/5EtnOkfLCeynOL85tJQuxum&382knaflk@@", 
    "FAILED_ATTEMPTS" : 0, 
    "FIRST_NAME" : "Sunit", 
    "LAST_NAME" : "Parekh", 
    "PASSWORD_RESET_ANSWER" : "manza", 
    "PASSWORD_RESET_QUESTION" : "My new car modal?", 
    "NICK_NAMES" : [  "sUnit",  "Mr S",  "Parekh" ], 
    "UPDATED_AT" : ISODate("2012-09-04T06:39:27Z") 
}
{ 
    "_id" : ObjectId("54fd7a32c39c4f3ea38968ab"), 
    "User_Id" : "satyamag", 
    "Date_Of_Birth" : ISODate("2012-09-04T06:39:25Z"), 
    "Email" : "satyamag@gmail.com", 
    "Password" : "$2a$10$2YTfqIK8Pd8GlbMDFZCvGOcJYLkQs7Hlpal4YF99iSh9yhnWPggZG", 
    "Failed_Attempts" : 1, 
    "First_Name" : "Satyam", 
    "Last_Name" : "Agarwal", 
    "Password_Reset_Answer" : "iphone", 
    "Password_Reset_Question" : "My phone?", 
    "Updated_At" : ISODate("2012-09-04T06:39:27Z") 
}
{ 
    "_id" : ObjectId("54fd7a32c39c4f3ea38968ac"), 
    "user_id" : "user-3", 
    "date_of_birth" : ISODate("2012-09-07T06:29:25Z"), 
    "email" : "9YZsVZg6@mailinator.com", 
    "password" : "password", 
    "failed_attempts" : 0, 
    "first_name" : "Christa", 
    "last_name" : "Gwaltney", 
    "password_reset_answer" : "android", 
    "password_reset_question" : "My phone?", 
    "updated_at" : ISODate("2012-09-04T06:39:27Z") 
}

As you can see only lower-case field names had been anonymized despite I've explicitly defined to anonymize upper-case and mixed-case fields. Please fix this as soon as possible. Thank you.

My setup:

MongoDB: 2.4.9 Ruby: 2.2.0 OS: Ubuntu 14.04

sunitparekh commented 9 years ago

yes, I agree with you that there is code of downcase. I will look into it and fix it in a day. I do not want to break it for others.

On Mon, Mar 9, 2015 at 4:57 PM, Wojciech Podgórski <notifications@github.com

wrote:

Hi, I have recently used data-anonymization to anonymize some data in MongoDB in project I am currently working on. Unfortunately I have discovered a bug. Since all the fields in all of my collections are in upper-case (just like in relational databases), I have wrote some anonymize methods with lower-case field names counting that the script will handle this. I have run the script and nothing happend. Afterwards, I read that queries on MongoDB are case sensitive, so I have changed anonymize invocations to upper-case, just like the fields names. Alas, that did not worked as well. I have tried other approaches, but nothing worked. My guess was that anonymize method is written in such way that it decapitalizes its input making it virtually impossible to anonymize fields that are not in lower-case. I had to write ugly js script to decapitalize filed names in collections, run anonymization and then capitlize field names once again.

Here is the example of a bug:

  • Ruby script for anonymization:

require 'data-anonymization' require 'mongo'Mongo::Connection.from_uri("mongodb://localhost/test").drop_database('test') system "mongoimport -d test --drop -c users --jsonArray ./users.json" DataAnon::Utils::Logging.logger.level = Logger::INFO

database 'test' do strategy DataAnon::Strategy::MongoDB::Blacklist

source_db :mongodb_uri => "mongodb://localhost/test", :database => 'test'

collection 'users' do anonymize('DATE_OF_BIRTH').using FieldStrategy::TimeDelta.new(5,30) anonymize('USER_ID').using FieldStrategy::StringTemplate.new('user-#{row_number}') anonymize('EMAIL').using FieldStrategy::RandomMailinatorEmail.new anonymize('PASSWORD') { |field| "password" } anonymize('FIRST_NAME').using FieldStrategy::RandomFirstName.new anonymize('LAST_NAME').using FieldStrategy::RandomLastName.new

anonymize('Date_Of_Birth').using FieldStrategy::TimeDelta.new(5,30)
anonymize('User_Id').using FieldStrategy::StringTemplate.new('user-#{row_number}')
anonymize('Email').using FieldStrategy::RandomMailinatorEmail.new
anonymize('Password') { |field| "password" }
anonymize('First_Name').using FieldStrategy::RandomFirstName.new
anonymize('Last_Name').using FieldStrategy::RandomLastName.new

anonymize('date_of_birth').using FieldStrategy::TimeDelta.new(5,30)
anonymize('user_id').using FieldStrategy::StringTemplate.new('user-#{row_number}')
anonymize('email').using FieldStrategy::RandomMailinatorEmail.new
anonymize('password') { |field| "password" }
anonymize('first_name').using FieldStrategy::RandomFirstName.new
anonymize('last_name').using FieldStrategy::RandomLastName.new

endend

  • JSON input:

[ { "USER_ID": "sunitparekh", "DATE_OF_BIRTH": { "$date":1346740765000 }, "EMAIL":"parekh.sunit@gmail.com", "PASSWORD":"TfqIK8Pd8GlbMDFZCX4l/5EtnOkfLCeynOL85tJQuxum&382knaflk@@", "FAILED_ATTEMPTS":0, "FIRST_NAME":"Sunit", "LAST_NAME":"Parekh", "PASSWORD_RESET_ANSWER":"manza", "PASSWORD_RESET_QUESTION":"My new car modal?", "NICK_NAMES" : ["sUnit","Mr S", "Parekh"], "UPDATED_AT":{ "$date":1346740767000 } }, { "User_Id": "satyamag", "Date_Of_Birth":{ "$date":1346740765000 }, "Email":"satyamag@gmail.com", "Password":"$2a$10$2YTfqIK8Pd8GlbMDFZCvGOcJYLkQs7Hlpal4YF99iSh9yhnWPggZG", "Failed_Attempts":1, "First_Name":"Satyam", "Last_Name":"Agarwal", "Password_Reset_Answer":"iphone", "Password_Reset_Question":"My phone?", "Updated_At":{ "$date":1346740767000 } }, { "user_id": "anandagrawal", "date_of_birth":{ "$date":1346740765000 }, "email":"anandagrawal84@gmail.com", "password":"Tz548O0RWusldVAWkwqfzO3jK/X4l/5EtnOkfLCeynOL85tJQuxum", "failed_attempts":0, "first_name":"Anand", "last_name":"Agrawal", "password_reset_answer":"android", "password_reset_question":"My phone?", "updated_at":{ "$date":1346740767000 } } ]

  • And the result in MongoDB AFTER anonymization:

{ "_id" : ObjectId("54fd7a32c39c4f3ea38968aa"), "USER_ID" : "sunitparekh", "DATE_OF_BIRTH" : ISODate("2012-09-04T06:39:25Z"), "EMAIL" : "parekh.sunit@gmail.com", "PASSWORD" : "TfqIK8Pd8GlbMDFZCX4l/5EtnOkfLCeynOL85tJQuxum&382knaflk@@", "FAILED_ATTEMPTS" : 0, "FIRST_NAME" : "Sunit", "LAST_NAME" : "Parekh", "PASSWORD_RESET_ANSWER" : "manza", "PASSWORD_RESET_QUESTION" : "My new car modal?", "NICK_NAMES" : [ "sUnit", "Mr S", "Parekh" ], "UPDATED_AT" : ISODate("2012-09-04T06:39:27Z") } { "_id" : ObjectId("54fd7a32c39c4f3ea38968ab"), "User_Id" : "satyamag", "Date_Of_Birth" : ISODate("2012-09-04T06:39:25Z"), "Email" : "satyamag@gmail.com", "Password" : "$2a$10$2YTfqIK8Pd8GlbMDFZCvGOcJYLkQs7Hlpal4YF99iSh9yhnWPggZG", "Failed_Attempts" : 1, "First_Name" : "Satyam", "Last_Name" : "Agarwal", "Password_Reset_Answer" : "iphone", "Password_Reset_Question" : "My phone?", "Updated_At" : ISODate("2012-09-04T06:39:27Z") } { "_id" : ObjectId("54fd7a32c39c4f3ea38968ac"), "user_id" : "user-3", "date_of_birth" : ISODate("2012-09-07T06:29:25Z"), "email" : "9YZsVZg6@mailinator.com", "password" : "password", "failed_attempts" : 0, "first_name" : "Christa", "last_name" : "Gwaltney", "password_reset_answer" : "android", "password_reset_question" : "My phone?", "updated_at" : ISODate("2012-09-04T06:39:27Z") }

As you can see only lower-case field names had been anonymized despite I've explicitly defined to anonymize upper-case and mixed-case fields. Please fix this as soon as possible. Thank you.

My setup:

MongoDB: 2.4.9 Ruby: 2.2.0 OS: Ubuntu 14.04

— Reply to this email directly or view it on GitHub https://github.com/sunitparekh/data-anonymization/issues/25.

thanks & regards, Sunit parekh.sunit@gmail.com

sunitparekh commented 9 years ago

Published new version 0.7.0 with downcase removed, please try and let me know if issue is fixed and working as expected for you.

wpodgorski commented 9 years ago

Hello, I've downloaded and installed 0.7.0. I've managed to test the fix on sample data I provided and on my project data, everything works as expected. Thank you! And thanks for really great and useful piece of software.