Closed giulianopietoso-apheris closed 4 years ago
Hi the second issue is due to parameter ordering. We realized the order should be flipped, and here we are with it causing a confusing error in your code.
Here is an example call from our test code
return PrivateReader(metadata, reader, budget).execute(query)
We plan on making it PrivateReader(reader, schema) very soon to avoid this confusion with warning based back compat for a couple releases to avoid breaking users
Our more common use case utilizes the SparkReader, this does not mean that these bugs should be let through so I created an issue to resolve the bugs
https://github.com/opendifferentialprivacy/smartnoise-sdk/issues/300
Hi there,
It does, indeed look like lines 56-58 in Postgres.py were indented incorrectly. After fixing that indentation error, I was able to make your example run.
Note that I needed to insert some records into the database table so that it would have some results:
CREATE TABLE PUMS.PUMS (age int, sex char(2), educ int, race char(2), income float, married boolean, pid int );
INSERT INTO pums.pums VALUES (35, 'm', 9, 'l', 90000, true, 1) ;
INSERT INTO pums.pums VALUES (30, 'f', 1, 'w', 30000, false, 2) ;
INSERT INTO pums.pums VALUES (55, 'm', 13, 'b', 60000, true, 3) ;
INSERT INTO pums.pums VALUES (18, 'f', 8, 'l', 50000, false, 4) ;
INSERT INTO pums.pums VALUES (7, 'm', 12, 'w', 30000, true, 5) ;
INSERT INTO pums.pums VALUES (75, 'f', 13, 'a', 190000, false, 6) ;
INSERT INTO pums.pums VALUES (105, 'm', 3, 'b', 900000, true, 7) ;
Then the following code works.
from opendp.smartnoise.sql import PrivateReader, PostgresReader
from opendp.smartnoise.metadata import CollectionMetadata
meta = CollectionMetadata.from_file("service/datasets/PUMS.yaml")
reader = PostgresReader("localhost", "pums")
#query = "SELECT AVG(age) AS age FROM pums.pums"
query = "SELECT COUNT(*) AS n FROM pums.pums"
res = reader.execute(query)
print(str(res))
priv = PrivateReader(meta, reader)
res2 = priv.execute(query)
print(str(res2))
I used schema name pums
, rather than public
, but your example should work fine with the public
schema.
In the example above, I am calling execute on both the PostgresReader and the PrivateReader, to make it easy to compare exact results with differentially private results.
Note that differentially private AVG() will be very noisy for small denominators; you will want to be in the several hundreds to thousands of records range before the AVG() is accurate. Counts will be more accurate for smaller n
, but you will still notice the counts jump around for such a small table.
We will do a scrub to try to understand how that spurious indent crept into the code, and update tomorrow. Thanks for reporting this!
The patch mentioned is on pypi now version 0.1.1
The issue was indeed the order of the parameters. Thank you very much for the help and now its working. Glad to report the issue on the identification.
After setting up the PUMS dataset in my Postgres database I tried running the example:
First issue is related to the PostgresSerializer:
This was a simple indentation problem in the class, which makes it break instantly before anything else happens. I searched for any open issues or PRs related to this in the repository but found nothing, which makes me wonder if anyone is actually using the PostgresReader at all.
https://github.com/opendifferentialprivacy/smartnoise-sdk/blob/master/sdk/opendp/smartnoise/sql/reader/postgres.py#L56
Any how, I fixed this locally so I could continue testing the example.
Output traceback error:
PUML.yaml
Database table
Has anyone faced anything similar or could shed some light on how to fix this?