privacytrustlab / ml_privacy_meter

Privacy Meter: An open-source library to audit data privacy in statistical and machine learning algorithms.
MIT License
556 stars 99 forks source link

Enhanced MIA #99

Open Ty0ng opened 1 year ago

Ty0ng commented 1 year ago

Hi what happened to the codes/folder for Enhanced MIA?

rzshokri commented 1 year ago

With this new version of privacy meter you can reproduce many of the results in enhanced MIA paper, and other papers. We will soon be adding more info plus access to the older paper code.

Ty0ng commented 1 year ago

Okay, thanks. Does this new version support python version >=3.6? However, I got this error on Colab when following the instruction for installation: Package 'privacy-meter' requires a different Python: 3.8.10 not in '>=3.9.0'

changhongyan123 commented 1 year ago

Hi @Ty0ng,

I wanted to follow up on the issues you reported. We have provided the pointer to the previous Enhanced MIA implementation in the research/readme.md file. Please refer to that for further details.

Regarding the issues on Colab, we have included a workaround in the tutorial/readme.md file. Could you please check if the provided solution resolves your problem?

If you have any further questions or concerns, please let us know.

Ty0ng commented 1 year ago

Yes the code worked. Thank you!

I have another question, for the reference attack. According to the paper, the target dataset used to train the target model should be different from the reference dataset for the reference model. What happens if the reference dataset is a subset of the target dataset? For example, training the target model on the full Cifar10 dataset and the reference models on subset of the Cifar10 (5000 points).

changhongyan123 commented 1 year ago

Hi @Ty0ng ,

To evaluate the privacy risk of a machine learning model, it's important to understand the security game being played, as outlined in Section 3.1 of in the paper. The privacy loss of the target model with respect to its training dataset is the adversary's success in winning the security game over multiple repetitions. The attack error depends on various factors listed in Section 3.2 of the paper.

Regarding your question, if the reference dataset is a subset of the target dataset, you are providing the adversary with additional information and changing the security game. Specifically, in this scenario, the adversary's objective would be to infer membership information about the target point, given the knowledge about a subset of the target model's training dataset. The results of the attack would have a different meaning compared to the reference attack evaluated in the paper. For a more thorough discussion on this topic, please refer to Section 4 of the paper.

I hope this explanation helps.