privacytoolsproject / PSI-Library

R library of differentially private algorithms for exploratory data analysis
6 stars 7 forks source link

Separating mechanism and statistic classes? #66

Open globusharris opened 5 years ago

globusharris commented 5 years ago

Currently, the statistics are subclasses of mechanisms. The user initializes the statistic with desired parameters (e.g. epsilon, delta, n), and when the actual mechanism is called, it is called by exporting that mechanism’s release method to the statistic, e.g.:

.self$result <- export(mechanism)$evaluate(fun=fun.covar, x=x, sens=sens, postFun=.self$postProcess, formula=formula, columns=columns, intercept=intercept)

This makes the statistic’s attributes (like the user-specified epsilon) callable by the mechanism. For a new user, it may be unclear where these attributes are initialized, since it is done implicitly in the export call in the statistic release method. Since the mechanism and statistic classes have fundamentally different roles in the library, I wonder if it would be clearer to have them as separate classes with clarity for what is passed into the mechanism, rather than having the statistics as a subclass of the mechanisms.

On the one hand, this will make the code less condense. But it might also be clearer. Thoughts?

globusharris commented 5 years ago

It's actually weirder than this, since export actually copies the dpStatistic object and then calls the evaluate function on this copy. (As opposed to just calling evaluate directly, which could be done since dpStatistics are subclasses of the mechanisms.)

As far as I can tell, this seems to be a very nonstandard way of doing things with R class inheritance; I have been unable to find any examples of other people using export in this way or documentation of export used for reference classes in this way.

So given this, it seems like either we should be doing a direct call to evaluate if we want the statistics to be subclasses of the mechanisms, or should divide out the two of them.