Open adam-dziedzic opened 1 month ago
Hi Adam,
Thanks for checking the details. the coeff is not the number of teachers. First of all, len(teachers_preds) denotes the number of queries for the screening process (see https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/digit_pytorch/knn.py#L176 ) . This is not the number of teachers, apologize for the confusing name.
Secondly, in screening, the global sensitivity is only 1 (adding or removing one private data only change the voting value by 1) and we conduct len(teachers_preds) queries. therefore, we do acct.compose_poisson_subsampled_mechanisms(gaussian, prob,coeff = len(teachers_preds))
which is equivalent to
`for i in range(screening_queries):
acct.compose_poisson_subsampled_mechanisms(gaussian, sample_prob, coeff = 1)`.
Lastly, we only publish labels for those queries that passed screening. Therefore, coeff=len(stdnt_labels))
, where stdnt_labels denotes the number of queries passed screening.
To make it clear, we can accumulate the privacy loss in this way: ` acct = rdp_acct.anaRDPacct() # declare the moment accountants
for i in range(screening_queries):
# coeff param is for how many times we compose the mechanism
acct.compose_poisson_subsampled_mechanisms(gaussian, sample_prob, coeff = 1)
for i in range(passed_screening_queries): acct.compose_poisson_subsampled_mechanisms(gaussian2, sample_prob, coeff = 1) eps = acct.get_eps(delta=delta)
Hello the Authors,
In the example in the ReadMe.md, should not the coeff be the number of teachers (the same as in the previous line): https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/ReadMe.md?plain=1#L112
Intuitively, in the previous line: https://github.com/jeremy43/Private_kNN/blob/547a1abc0ec2daa5e069134687e15ad76dd0b408/ReadMe.md?plain=1#L111
the privacy budget expanded on screening depends on the number of teachers. Shouldn't it be the case for the second line as well?
Finally, we'd accumulate the privacy cost by answering many queries, in this way: