Closed chinmayshah99 closed 3 years ago
Fortunately, this isn't a DP violation because the noise additional step can also overflow. That said, it's not great from an accuracy perspective.
We can't return an error on overflow because that would be a deterministic data-dependent behavior. We can, however, cap everything at the maximum and minimum values and prevent them from wrapping around. That could potentially lead to data loss (once we hit the maximum value, adding more input does nothing), but it's more accurate than returning very large-magnitude negative values.
Keeping the overflow caused by noise addition aside, though it is a deterministic behavior, it would be a better idea to raise some kind of warning as the user would keep on adding data without realizing its already overflowed. This is the decision we need to make.
I don't think we're willing to raise an error or return a warning in violation of the DP guarantees. Doing that would compromise differential privacy for every client of the library, whether they want that behavior or not. Instead, I'd suggest defaulting to 64-bit integers - they're substantially harder to overflow than 32-bit ones.
Closing this as the workaround here is to use a larger data type. Additionally we also added a wrap-around for the current implementation, so that this should work by now.
When I use
BoundedMean<int>
and pass a large data to the function, due to large data, the int overflows and gives the output as the lowerbound.How to reproduce:
Possible fix: Use SafeAdd() when adding each element to the vector and whenever it throws an error, generate an error.