privacytoolsproject / PSI-Library

R library of differentially private algorithms for exploratory data analysis
6 stars 7 forks source link

Issue with sapply for covariance? #120

Closed ctcovington closed 4 years ago

ctcovington commented 4 years ago

I was trying to get the MF_MakingVignettes branch up to date with develop and am running into an issue with the covariance statistic.

In particular, I think the line diffs <- sapply(rng,diff) should be diffs <- apply(rng, 1, diff). In my testing, the former seems to be trying to take the diff of each element of the rng matrix, while the latter correctly takes the difference of elements in each row.

# example data
x1 <- c(3, 12, 20, 42, 33, 65, 70, 54, 33, 45)
x2 <- c(11, 42, 16, 20, 21, 86, 30, 50, 73, 94)
data <- data.frame(x1, x2)

# range of the example data
range1 <- c(0,70)
range2 <- c(0,100)
ranges <- do.call(rbind, list(range1, range2))

# get diffs for each variable 

# NOTE: this fails silently
diffs_wrong <- sapply(ranges,diff)

# NOTE: this should be correct
diffs_right <- apply(rng, 1, diff)
globusharris commented 4 years ago

Ah, I think the documentation for the input range to the sensitivity function is what is unclear here. If you look in the test_covariance.R file, the way that range is specified is more clear: it's designed to accept a list of tuples. I.e. in your example it would be

range1 <- c(0,70)
range2 <- c(0,100)
ranges <- list(range1, range2)

Then,

diffs <- sapply(ranges, diff)

correctly returns

70 100

I've added example code and extra docs to the sensitivity function which are in that pull request, does that clarify things? If not I can add more.

globusharris commented 4 years ago

Or if I've misunderstood and there is another issue, let me know :)

ctcovington commented 4 years ago

Thanks @globusharris!

For our records, Ira and I talked outside of github about this and the vignettes (where I was getting my information) just were not up to date with Ira's latest changes. We're resolving now, so I'll close the issue.