privacysandbox / aggregation-service

This repository contains instructions and scripts to set up and test the Privacy Sandbox Aggregation Service
Apache License 2.0
62 stars 30 forks source link

Private Aggregation API - no metrics in summarised report #27

Closed anthonymassard closed 11 months ago

anthonymassard commented 11 months ago

Hello,

I'm currently experimenting with the Private Aggregation API and I'm struggling to validate that my final output is correct

From my worklet, I perform the following histogram contribution:

privateAggregation.contributeToHistogram({ bucket: BigInt(1369), value: 128 });

Which is correctly triggering a POST request with the following body:

 {
  aggregation_service_payloads: [
    {
      debug_cleartext_payload: 'omRkYXRhgaJldmFsdWVEAAAAgGZidWNrZXRQAAAAAAAAAAAAAAAAAAAFWWlvcGVyYXRpb25paGlzdG9ncmFt',
      key_id: 'bca09245-2ef0-4fdf-a4fa-226306fc2a09',
      payload: 'RVd7QRTTUmPp0i1zBev+4W8lJK8gLIIod6LUjPkfbxCOHsQLBW/jRn642YZ2HYpYkiMK9+PprU5CUi9W7TwJToQ4UXiUbJUgYwliqBFC+aAcwsKJ3Hg46joHZXV5E0ZheeFTqqvLtiJxlVpzFcWd'
    }
  ],
  debug_key: '777',
  shared_info: '{"api":"shared-storage","debug_mode":"enabled","report_id":"aaa889f1-2adc-4796-9e46-c652a08e18ca","reporting_origin":"http://adtech.localhost:3000","scheduled_report_time":"1698074105","version":"0.1"}'
}

I've setup a small node.js server handling requests on /.well-known/private-aggregation/debug/report-shared-storage basically doing this:

  const encoder = avro.createFileEncoder(
    `${REPORT_UPLOAD_PATH}/debug}/aggregation_report_${Date.now()}.avro`,
    reportType
  );

  reportContent.aggregation_service_payloads.forEach((payload) => {
    console.log(
    "Decoded data from debug_cleartext_payload:",
    readDataFromCleartextPayload(payload.debug_cleartext_payload)
    );

    encoder.write({
      payload: convertPayloadToBytes(payload.debug_cleartext_payload),
      key_id: payload.key_id,
      shared_info: reportContent.shared_info,
    });
  });

  encoder.end();

As you can see at this point I'm printing the decoded data on console and I can see as expected: Decoded data from debug_cleartext_payload: { value: 128, bucket: 1369 }

However, now I'm trying to generate a summary report with the local test tool by running the following command:

java -jar LocalTestingTool_2.0.0.jar --input_data_avro_file aggregation_report_1698071597075.avro --domain_avro_file output_domain.avro --no_noising --json_output --output_directory ./results

No matther what value I've passed as payload of the contributeToHistogram method, I always got 0 on the metric field:

[ {
  "bucket" : "MTM2OQ==", // 1369 base64 encoded
  "metric" : 0
} ]

Am I doing something wrong ?

Apart of this issue, I wonder how it would work in real life application, currently this example is handling one report at a time which is sent instantly because of being in debug_mode, but in real situation, how are we supposed to process a big amount of reports at once ? Can we pass a list of files to the --input_data_avro_file ? Should we batch the reports prior to converting it to avro based on the shared_info data? If yes, based on which field?

Thank you by advance !

maybellineboon commented 11 months ago

Hi Anthony,

This happens when your output_domain.avro does not contain the correct bucket key. Your bucket key in output_domain.avro should be escaped unicode hex converted into a byte array. Since your bucket key is 1369, your output_domain.json should look something like the below.

{
    "bucket": "\u0005Y"
}

OR

{
    "bucket": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0005Y"
}

I tried out your payload and was able to get a summary report with the correct 128 metric.

Once you fix the conversion, let me know if you're still getting the issue.

Additionally, for reports, you can convert the reports to single avro reports and batch them through Aggregation Service with the input_data_blobprefix. Say your report names are `report.avro, you can use input_data_blob_prefix to bereportand it will take all files/folders with prefixreport`.

Alternately, you can also collect all JSON reports and convert it to single/few avro reports to be sent to Aggregation Service for batching.

Thanks!

anthonymassard commented 11 months ago

Hi @maybellineboon,

Thanks a lot for your answer it was really helpful ! I managed to get a summary report with the expected metrics on my side too 👍

And thanks also for the clarification on the batching side :)

elangobharathi commented 6 months ago

Hi @anthonymassard Thank you. This thread was really helpful to me to understand about the bucket key. Is it possible for you to share the complete JavaScript code that you posted in your question? I hope it would help me understand the flow better as I am not familiar with the golang code example in the collecting.md.