onlinf / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Feature request: "Anonymise Names" text transformation #174

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What feature would you like?
I would like to quickly be able to anonymise text in cells. While this is 
outside the original scope of Refine, e.g. cleaning & inputting data into 
Freebase, it would be a valuable transformation for agencies that are 
attempting to release sensitive info.

Here is a naive approach:
 - split the string into n-length characters
 - replace each letter with a randomly chosen letter, retaining the case
 - join characters together as a string

Original issue reported on code.google.com by mcnamara.tim@gmail.com on 1 Nov 2010 at 9:22

GoogleCodeExporter commented 9 years ago
The algorithm you describe is easy to implement, but I don't understand how it 
is better than either a) deleting the text or b) replacing all characters with 
a constant value such as 'x' or '*'.  Note also that leaving the word lengths 
intact represents a leak which may allow some amount of information to be 
recovered.

Original comment by tfmorris on 1 Nov 2010 at 9:37

GoogleCodeExporter commented 9 years ago
Good point. I guess I wanted to create the impression that the text still 
looked like names.

Original comment by mcnamara.tim@gmail.com on 1 Nov 2010 at 9:50

GoogleCodeExporter commented 9 years ago
Do you want the same name appearing in several cells to be anonymized to the 
same random string? And different names to different random strings?

Original comment by dfhu...@gmail.com on 1 Nov 2010 at 10:01

GoogleCodeExporter commented 9 years ago
That's a good idea also David. It would allow sorting to be occur and if 
something relies on a name field, then it's less likely to break. 

Original comment by mcnamara.tim@gmail.com on 1 Nov 2010 at 10:13

GoogleCodeExporter commented 9 years ago

Original comment by tfmorris on 16 Nov 2010 at 6:15