privacytrustlab / ml_privacy_meter

Privacy Meter: An open-source library to audit data privacy in statistical and machine learning algorithms.
MIT License
556 stars 99 forks source link

fix stdev calc for fixed variance #121

Open rpreen opened 1 month ago

rpreen commented 1 month ago

This PR fixes the standard deviation calculation when using a fixed variance.

The code is copied correctly from: https://github.com/carlini/privacy/blob/afe6ea7699a93899011d47aa13c4bf9a19c0c8ad/research/mi_lira_2021/plot.py#L82-L84

But, the @carlini paper seems to suggest that it should be calculated as in this PR?

With a small number of shadow models, we can improve the attack considerably by estimating the variances $\sigma{\text{in}}^2$ and $\sigma{\text{out}}^2$ of model confidences in Algorithm 1 globally rather than for each individual example. That is, we still estimate the means $\mu{\text{in}}$ and $\mu{\text{out}}$ separately for each example, but we estimate the variance $\sigma{\text{in}}^2$ (respectively $\sigma{\text{out}}^2$) over the shadow models' confidences on all training set members (respectively non-members).

For a small number of shadow models ($<64$), estimating a global variance outperforms our general attack that estimates the variance for each example separately. For a larger number of models, our full attack is stronger: with $1024$ shadow models for example, the TPR decreases from $8.4\%$ to $7.9\%$ by using a global variance.

And further stated in the Appendix:

(2) estimate the means $\mu{\text{in}}, \mu{\text{out}}$ for each example, but estimate global variances $\sigma{\text{in}}^2, \sigma{\text{out}}^2$