opendp / smartnoise-core

Differential privacy validator and runtime
MIT License
289 stars 33 forks source link

Dp linear regression #287

Closed ecowan closed 4 years ago

ecowan commented 4 years ago

This PR contains 2 mechanisms which are exposed to the Python layer:

  1. TheilSen - a data transformation which returns an array of pairwise slopes and intercepts from the given data.
  2. DpGumbelMedian - implementation of a DP median using noise sampled from a Gumbel distribution.

Potential To Dos:

  1. One function is being left here, though it is not being used directly. dp_theil_sen_k_subset - selects k random points from data and performs dp_theil_sen on them. This may not be needed, given that we have theil_sen_k_match which selects (n/2) pairs k times, for k*n/2 < n^2

  2. Some functions in linreg_theilsen.rs may be made private.

Shoeboxam commented 4 years ago

This branch is working now, I'm able to release theil-sen stats from python. More testing needed.