privacytrustlab / ml_privacy_meter

Privacy Meter: An open-source library to audit data privacy in statistical and machine learning algorithms.
MIT License
557 stars 99 forks source link

Issue about implementation of different datasets #31

Closed BoxiangW closed 3 years ago

BoxiangW commented 3 years ago

I have tried the default dataset of cifar100 and purchase100, for cifar100, it could reach a accuracy of 75%, while for purchase100, I could only get an accuracy of 52%, which is basiclly randomly guessing for the model. I wonder are there some special settings to use non-image dataset. Besides, I tried to implement cifar10 here also, and the accuracy is around 63%, I am wondering did you tried this privacy meter with cifar10 and are there some adjustments needed. Thanks.

amad-person commented 3 years ago

Hi @BoxiangW, thanks for opening this issue.

These are a few things that have worked for me in the past for increasing the attack accuracy against a target model:

  1. Increasing the number of epochs while training the target model.
  2. Decreasing the size of the training dataset.

The intuition is that we want the target model to overfit on the training dataset, so that the membership inference attack is able to find a good separator between member and non-member features.

Non-image datasets like Purchase100 can be "easier" to learn (because the number of features is lesser than image datasets). This means your target model might already be generalising well, so you can try to overfit the target model and check if the attack accuracy increases.

BoxiangW commented 3 years ago

Thanks for the information.