microsoft / prv_accountant

A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy.
MIT License
65 stars 10 forks source link

PRV accountant breaks for small delta values #17

Open mshubhankar opened 2 years ago

mshubhankar commented 2 years ago

PRV accountant overestimates epsilon values for very small delta values. Below are two examples where all parameters are constant except the delta value.

There seems to be a huge gap between RDP and PRV for the delta = 1.13e-18 with sampling prob 250/750000:

compute-dp-epsilon --sampling-probability 0.00033 --noise-multiplier 4 --delta 1.1e-18 --num-compositions 10000
PRV Accountant:     eps_lower = -0.044 eps_estimate =  0.056, eps_upper =  0.156 
RDP Accountant:     eps_lower =    0.0 eps_estimate =  0.669, eps_upper =  0.669 

compute-dp-epsilon --sampling-probability 0.00033 --noise-multiplier 4 --delta 1.13e-18 --num-compositions 10000
PRV Accountant:     eps_lower =    3.7 eps_estimate =    3.8, eps_upper =    3.9 
RDP Accountant:     eps_lower =    0.0 eps_estimate =  0.669, eps_upper =  0.669 

For a slightly bigger sampling probability of 250/650000, the PRV accountant breaks for 1.1e-18. See example below:

compute-dp-epsilon --sampling-probability 0.00038 --noise-multiplier 4 --delta 1.13e-18 --num-compositions 10000       
PRV Accountant:     eps_lower =   3.83 eps_estimate =   3.93, eps_upper =   4.03 
RDP Accountant:     eps_lower =    0.0 eps_estimate =  0.669, eps_upper =  0.669 

compute-dp-epsilon --sampling-probability 0.00038 --noise-multiplier 4 --delta 1.11e-18 --num-compositions 10000
PRV Accountant:     eps_lower = -0.0336 eps_estimate = 0.0664, eps_upper =  0.166 
RDP Accountant:     eps_lower =    0.0 eps_estimate =   0.67, eps_upper =   0.67 
wulu473 commented 2 years ago

Thanks for providing feedback. Yes, for deltas in this range we're hitting floating point issues. We've discussed this in appendix A of the paper.

There might be some ways around that limitation with higher precision or arbitrary precision floating point representation however that will be at the cost of performance. We have made the decision to use the current floating point type since delta is typically set to be <<1/num_participants and for practical scenarios we don't expect there to be more than 10^9 participants. Do you have an application where you need deltas of the order of 10^{-18} or smaller?

wulu473 commented 2 years ago

For the time being, I'll add a check to make sure the delta provided is within the floating point resolution.

mshubhankar commented 2 years ago

Thanks for your quick reply. Our purposes are research specific for now. We observe that for some specific hyperparameter tuning problems, delta values may go to orders of 10^-18 even for datasets of length 10^6. Is there a simple way in the current implementation to make PRV work with smaller delta values by trading the cost of performance?

wulu473 commented 2 years ago

Currently there isn't a simple way that comes to mind. We need several not so standard maths functions like scipy.special.erfc and scipy doesn't support types other than the standard floating point types of numpy. We have supporting arbitrary precision types on our backlog though. I will increase the priority of that item and let you know here if there are any updates.

wulu473 commented 2 years ago

18 should at least check if deltas are within the floating point range before outputting values.