DPGAN + PATECTGAN: strange behaviour increasing epsilon budget

Issue Description

I run a scrip performing a series of operations to test the accuracy and privacy of two of the available data synthesis methods: DPGAN vs PATECTGAN. (script : run_comparison.py available in the attached MATERIAL-SMARTNOISE.zip )

Why if I increase the budget, the RandomForest classifier is still able to distinguish the private synthetized dataset from the original one? I expect that with epsilon converging to infinity I create in a certain way the “ideal” GAN that can reproduce perfectly the original distribution of the PUMS dataset and so the accuracy of the classifier decreases as the classifier is not able to distinguish the origins of the data. While this does not happen? As you can see in this plot, the accuracy raises up to ~ 95%. Accuracy_DPGAN_PATEGAN_log(epsilon) There is value near to epsilon = 5.0 where PATECGAN does 62%. I’m asking: why then it’s getting worse? So, I decided to write you, in order to shed some lights about this behaviour cause maybe I’m doing something wrong during the training of NN or Random Forest binary classifier.

Environment

python=3.10

Commands

You could find all the scripts for running and compare the models in the attached MATERIAL-SMARTNOISE.zip .

Results

You could synthetic private data in csv format in the attached MATERIAL-SMARTNOISE.zip .

opendp / smartnoise-sdk