jeremy43 / Private_kNN

23 stars 9 forks source link

computing the privacy #8

Open adam-dziedzic opened 1 month ago

adam-dziedzic commented 1 month ago

Hello the Authors,

In the example in the ReadMe.md, should not the coeff be the number of teachers (the same as in the previous line): https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/ReadMe.md?plain=1#L112

Intuitively, in the previous line: https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/ReadMe.md?plain=1#L111

the privacy budget expanded on screening depends on the number of teachers. Shouldn't it be the case for the second line as well?

Finally, we'd accumulate the privacy cost by answering many queries, in this way:

    acct = rdp_acct.anaRDPacct() # declare the moment accountants
    gaussian = lambda x: rdp_bank.RDP_gaussian({'sigma': sigma1}, x) # noisy screeening
    gaussian2 = lambda x: rdp_bank.RDP_inde_pate_gaussian({'sigma': sigma2}, x) # noisy aggregation
    epsilons = []
    for i in range(nr_answered_queries):
        # coeff param is for how many times we compose the mechanism
        acct.compose_poisson_subsampled_mechanisms(gaussian, sample_prob, coeff = nr_teachers)
        acct.compose_poisson_subsampled_mechanisms(gaussian2, sample_prob, coeff = nr_teachers)
        eps = acct.get_eps(delta=delta)
        epsilons.append(eps)
jeremy43 commented 1 month ago

Hi Adam,

Thanks for checking the details. the coeff is not the number of teachers. First of all, len(teachers_preds) denotes the number of queries for the screening process (see https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/digit_pytorch/knn.py#L176 ) . This is not the number of teachers, apologize for the confusing name.

Secondly, in screening, the global sensitivity is only 1 (adding or removing one private data only change the voting value by 1) and we conduct len(teachers_preds) queries. therefore, we do acct.compose_poisson_subsampled_mechanisms(gaussian, prob,coeff = len(teachers_preds)) which is equivalent to `for i in range(screening_queries):

coeff param is for how many times we compose the mechanism

    acct.compose_poisson_subsampled_mechanisms(gaussian, sample_prob, coeff = 1)`.

Lastly, we only publish labels for those queries that passed screening. Therefore, coeff=len(stdnt_labels)), where stdnt_labels denotes the number of queries passed screening.

To make it clear, we can accumulate the privacy loss in this way: ` acct = rdp_acct.anaRDPacct() # declare the moment accountants

for i in range(screening_queries):
    # coeff param is for how many times we compose the mechanism
    acct.compose_poisson_subsampled_mechanisms(gaussian, sample_prob, coeff = 1)

for i in range(passed_screening_queries): acct.compose_poisson_subsampled_mechanisms(gaussian2, sample_prob, coeff = 1) eps = acct.get_eps(delta=delta)