Open bmcase opened 1 year ago
Afaiu large amounts of IDs are intentionally prevented by most proposals to prevent their usage for tracking. If they are necessary for some use-case their leakage should be analyzed.
Thanks for filing this issue @bmcase . So for the "sparse" histogram case the prototypical use-case of publisher breakdowns (documented in https://github.com/WICG/attribution-reporting-api/issues/583) can be solved with publisher reports as you describe.
I want to emphasize two things though:
@csharrison thanks for clarifying. I agree that delays for publisher reports are a concern.
My more major concern in the F2F meeting was around "dense + large" histograms, rather than the truly sparse case (where I believe sketching techniques will also work).
I would think that "dense + large" should also be able to supported through publisher reports as described above. Was there a reason besides delays that you were thinking we'd need to use Advertiser reports for the "dense+large" case?
I would think that "dense + large" should also be able to supported through publisher reports as described above. Was there a reason besides delays that you were thinking we'd need to use Advertiser reports for the "dense+large" case?
Hm that's a good question. It's kind of hard to answer given that delays are so important for the publisher reports. Even if delays were reduced and you could requery across multiple windows (like ARA event-level reports supports), it might require composition / more noise. Maybe there could be a better solution here though!
My impression is that for non-optimization use-cases, advertiser reports are more natural so that's what I was focusing on.
Clarifying Google’s sparse histogram use case for PAM
I’ll open this issue here on PAM, but it is more of a question directed to @csharrison. Charlie, I’d like to clarify my understanding of the use case you are trying to solve for with the sparse histograms that you talked about in the PAM ad hoc call and expressed the need for having a very large # of adIDs sent to the device for the mapping table used to generate Advertiser reports.
In showing ads on the open web, there are three use cases that seem like they could be related to what you’re looking for (informed by Ben Savage’s experience with Meta’s Audience Network):
My understanding is you’re trying to solve something like this 3rd use case using Advertiser reports which is why you need to ship down a set of adIDs roughly the size of the # of sites. I’ve been thinking about how you might be able to solve for this 3rd use case using PAM publisher reports and keeping this huge mapping off the device. Luke clarified that doing what we usually call “late binding” of breakdown keys to publisher reports seems reasonable in PAM. In fact PPM has an issue to potentially support just this in PRIO in letting shares come with labels and then the query is just to aggregation all things with the same label.
I think this can let us solve the 3rd use case in the following way:
Charlie, can you clarify if these are the use cases you’re trying to support or if there is a further complex use case? Luke, if you see something about this construction PAM couldn’t support please let me know.