qascade / dcr

A PoC framework to orchestrate interoperable Differentially Private Data Clean Room Services using Intel SGX hardware as root of trust.
GNU Affero General Public License v3.0
19 stars 15 forks source link

feat: add an example of a join query with a group by but using confidential go app. #37

Open qascade opened 1 year ago

qascade commented 1 year ago

Description

The use case only does a simple count query without any partitions. I want to add an example of partitions that properly demonstrates the use case of the maxContributionsPerUsers option inside the google dp definition.

How to do this ?

Use the same media/ advertiser/research data set for simplicity, although you are free to run your imagination for a different scenario you will have to generate your datasets.

You can do a query like What are the common customers grouped by the kind of pet they have. So the output should be: Private Count of Customers who have dogs, Private Count of Customers who have cats.

In SQl terms the query should look like:

SELECT
    ac.pets,
    COUNT(DISTINCT ac.email) AS count_common_customers
FROM
    media_customers mc
INNER JOIN
    airline_customers ac ON mc.email = ac.email
GROUP BY
    ac.pets;

So the end result should be a Confidential GoApp with appropriate Yaml modifications that functions exactly as the above query would.

Sarthak027 commented 1 year ago

Hey, I want to work on this issue. Can you assign me on this issue under Gssoc'23.

qascade commented 1 year ago

@Sarthak027 Awesome. Assigning this to you. You will have to understand Google's differential privacy definition and how it works. Please note, SQL is just for understanding you will have to write a go template that compiles and does same thing as what SQL query will do. Let me know if you have any further questions. Name the branch feat.join_goapp

Google Repo: https://github.com/google/differential-privacy Video: https://youtu.be/1F6pRMVGWdc Paper: https://arxiv.org/abs/1909.01917