labordynamicsinstitute / SynUSpopulation

Creating a synthetic US population dataset
3 stars 4 forks source link

Too many synthetic households are identical #5

Open simsong opened 7 years ago

simsong commented 7 years ago

The synthesis procedure involves taking random draws of households and replicating them. This synthesis process results in large numbers of identical households. That is not realistic and may create issues in the development of privacy algorithms. Alternatively, each household could be modified to have random noise. For example, for each household could be perturbed to an adjacent household—by randomly adding or removing an individual, or by modifying an attribute, such as an age or a relationship to Person 1.