Open MeganFantes opened 5 years ago
the boot_mean
object is created in dp-mean.Rmd
:
boot_mean <- dpMean$new(mechanism='mechanismBootstrap', var.type='numeric',
variable='income', n=10000, epsilon=0.1, rng=c(0, 750000),
n.boot=n.boot)
Then we have the call:
boot_mean$release(PUMS5extract10000)
which refers to the release
method of a dpMean
object called boot_mean
. This throws the error above.
release
method of dpMean
object in statistic_mean.R
:
dpMean$methods(
release = function(data, ...) {
x <- data[, variable]
sens <- diff(rng) / n
.self$result <- export(mechanism)$evaluate(mean, x, sens, .self$postProcess, ...)
})
export(mechanism)
exports all fields and methods of the mechanism passed into the object.
In this case, the mechanism is the mechanismBootstrap
, which has the method evaluate
evaluate
method of the mechanismBootstrap
class in mechanism-boostrap.R
:
mechanismBootstrap$methods(
evaluate = function(fun, x, sens, postFun) {
x <- censordata(x, .self$var.type, .self$rng)
x <- fillMissing(x, .self$var.type, .self$impute.rng[0], .self$impute.rng[1])
epsilon.part <- epsilon / .self$n.boot
release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
std.error <- .self$bootSE(release, .self$n.boot, sens)
out <- list('release' = release, 'std.error' = std.error)
out <- postFun(out)
return(out)
})
Interesting to note that the ...
operator is passed into evaluate
, but ...
is not in the method signature
According to the stack trace from the error, the problem here is the replicate()
method. This method repeats the bootstrap.replication
method n.boot
times. The problem is in bootstrap.replication
.
bootstrap.replication
function in mechanism-boostrap.R
:
bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
partition <- rmultinom(n=1, size=n, prob=rep(1 / n, n))
max.appearances <- max(partition)
probs <- sapply(1:max.appearances, dbinom, size=n, prob=(1 / n))
stat.partitions <- vector('list', max.appearances)
for (i in 1:max.appearances) {
variance.i <- (i * probs[i] * (sensitivity^2)) / (2 * epsilon)
stat.i <- fun(x[partition == i])
noise.i <- dpNoise(n=length(stat.i), scale=sqrt(variance.i), dist='gaussian')
stat.partitions[[i]] <- i * stat.i + noise.i
}
stat.out <- do.call(rbind, stat.partitions)
return(apply(stat.out, 2, sum))
}
fun(x[partition == i])
calls the function that was passed in, which was bootStatEval
Here, I wanted to figure out what x[partition == i]
actually means.
x
is a vector of values indicating the income of each person in the original dataset.
partition == i
is a vector of booleans.
x[partition == i]
should be a subset of x
, with the values at the indices with TRUE
at the original value, and the rest at 0. This is mostly true, except the values at FALSE
seem to be random values. I think this is because of a differentially private protocol?
(I figured this out using many print statements throughout the code)
bootStatEval
method of the mechanismBootstrap
class in mechanism-bootstrap.R
:
mechanismBootstrap$methods(
bootStatEval = function(xi) {
fun.args <- getFuncArgs(fun, inputList=list(...), inputObject=.self)
input.vals = c(list(x=x), fun.args)
stat <- do.call(boot.fun, input.vals)
return(stat)
})
I think I found the problem:
The mean
function is passed into evaluate()
and then nothing is done with it. Instead, the function passed into replicate()
is set to fun=.self$bootStatEval
.
Then, in bootstrap.replication
, the function applied to x[partition == i]
is bootStatEval
.
In the replicate()
function call, we do not want to repeat bootStatEval
n
times, we want to calculate the mean n
times, that is what bootstrapping is.
I think the call to bootStatEval
should be a hard-coded call somewhere in bootstrap.replication
, because bootStatEval
is a sanity check (I think?) it is not a parameter that needs to be passed around. mean
as the function we are interested in bootstrapping is a parameter we would want to pass around, because we will want to bootstrap different values eventually.
The error happens because bootStatEval
expects a parameter called fun
, but no such parameter is passed in. I think this fun
is the mean
function passed into evaluate()
.
(Similarly, a ...
operator is passed into evaluate()
and then never used (evaluate
does not have a ...
in its method signature). Eventually bootStatEval
will look for a ...
operator and will not find one, and I think the ...
from the evaluate()
function call is it.)
in mechanism-bootstrap.r
:
changed evaluate = function(fun, x, sens, postFun) {
to: evaluate = function(fun, x, sens, postFun, ...) {
...
operator to the method signature, so we can pass it to getFuncArgs()
laterchanged release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
to: replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=fun, inputObject = .self, ...))
fun
to the input function (in this case, mean)inputObject
to pass the bootstrap mechanism object to bootstrap.replication
so we can call bootStatEval
later...
operator to pass to bootStatEval
laterchanged bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
to: bootstrap.replication <- function(x, n, sensitivity, epsilon, fun, inputObject, ...) {
and added: @param inputObject the Bootstrap mechanism object on which the input function will be evaluated
inputObject
so we can call bootStatEval
...
operatorchanged stat.i <- fun(x[partition == i])
to: stat.i <- inputObject$bootStatEval(x[partition == i], fun, ...)
bootStatEval
changed bootStatEval = function(xi) {
to: bootStatEval = function(xi, fun, ...) {
bootStatEval
changed input.vals = c(list(x=x), fun.args)
to: input.vals = c(list(x=xi), fun.args)
xi
instead of x
changed stat <- do.call(boot.fun, input.vals)
to: stat <- do.call(fun, input.vals)
fun
instead of boot.fun
Now the dp-mean
vignette runs, but the bootstrapped mean will occasionally return NaN
as the result
The NaN
s being produced are from when the partition
vector is created in bootstrap.replication
. Sometimes when the partition vector is created, one partition is empty.
Fixed all problems. Added validation in bootstrap.replication to ensure it is only calculating a statistic for a partition that contains values.
Bug found 6/4
error:
When running the
dp-mean.Rmd
vignette, at lineboot_mean$release(PUMS5extract10000)
, get the error:Error in formals(targetFunc) : object 'fun' not found
source:
mechanism-bootstrap.R, line 41:
getFuncArgs
uses a variable calledfun
, but there is nofun
parameter in the method signature.