privacycg / private-click-measurement

Private Click Measurement
https://privacycg.github.io/private-click-measurement/
200 stars 8 forks source link

Require description of the click content and if it is an ad, information on who purchased it #54

Open johnwilander opened 3 years ago

johnwilander commented 3 years ago

We have an opportunity to add transparency to online advertising now that we are formalizing click attribution. We should require a short, human readable description of the click content and if the content is an ad there should also be information on who purchased it. The "who purchased it" could be a URL if we believe that could work for all purchasers of ads.

It should be allowed to supply the description in the user's localized/preferred language.

These pieces of information are solely meant to benefit the user and should not be sent to servers in any readable format. The only reason why they would ever leave the browser would be if the browser vendor wants to sync click measurement data across devices, in which case these pieces of data should be encrypted so that servers cannot read them.

The intent is to allow users to review their own pending ad clicks and which ads have received attribution from their web browsing. Browsers would obviously need to highlight the source and trustworthiness of these pieces of data if they are used in any trusted browser UI.

There's potential accessibility benefit too, in being able to inform the user of the ad they are about to click on.

johnwilander commented 3 years ago

Ping @csharrison @englehardt @benjaminsavage. Thoughts?

michael-oneill commented 3 years ago

A Url would be great, would this be a machine readable resource e.g. JSON or a viewable web page to show the user?

JSON would let the browser sanitise the information before it is rendered, check the registered domain etc.

csharrison commented 3 years ago

I like this general idea but my biggest concern is allowing sites to inject untrusted strings that will be rendered on browser UI vs content UI. Historically these sorts of patterns have been abused by actors trying to trick people.

A URL might may alleviate these concerns by allowing a user to navigate to a description of the content. cc @michaelkleber for thoughts as well.

michael-oneill commented 3 years ago

A JSON resource would allow the browser to sanitise textual information presented to the user, i.e. remove script and html. It could also contain an url for user viewable content, checked by the browser to be same domain as the rendered ad (to mitigate against links to malware etc.), which the browser could offer as a clickable link.

csharrison commented 3 years ago

I think a string and a URL may be enough to trick the user. Something along the lines of: "Your personal data has been stolen, pay $200 to recover at evil.example"

I don't think this is a dealbreaker, but it seems difficult for an automatic sanitization step to catch without manual review.

michael-oneill commented 3 years ago

The ad could also contain a lie. Presenting one in browser UI text might give it some extra credibility, but surely some quote symbols with a statement like "this advertiser claims that ..." would mitigate.

johnivdel commented 3 years ago

Another option could be to have a pre-set list of descriptions, and have the ad select one of them at click time which best fits the click content.

This also ensures that the descriptions are accessible for all users.

michael-oneill commented 3 years ago

I imagine the most useful information would be hard to categorise e.g. this ad was funded by XXX, a lobbyist for YYY whose legal rep is ZZZ. The form of text could be a legally prescribed template but with advertiser supplied variables.

darobin commented 3 years ago

I would be concerned about adding this without a mechanism to ensure that it is authentic. As a threat model, imagine a disinformation ad claiming to have been paid for by nytimes.com. Not great!

Knowing who paid for an ad would be great (for all kinds of reasons) and using attribution as a forcing function is interesting. Ideally, this would somehow be signed but I'm not sure what this would sign that wouldn't allow it to be hijacked just by reusing the sig. I suspect that turning ad creatives into bundles (single-origin, etc. not the current version) and having the hash of that be signed for "who purchased" would work better here.

If your goal is to produce a UI, I wouldn't expect marketers to put anything useful in there. Either it will be generic (works for all campaigns but means nothing), internal (means nothing to people outside that marketing team), or it'll be wrong (was entered in the first campaign and kept as is). From a user's PoV, a screenshot and a capture of whatever accessible alternative is in the creative might work best?

johnwilander commented 3 years ago

@johnivdel said:

Another option could be to have a pre-set list of descriptions, and have the ad select one of them at click time which best fits the click content.

This also ensures that the descriptions are accessible for all users.

What I think we should do is have an enum for ad type and require them to pick one.

@darobin said:

I would be concerned about adding this without a mechanism to ensure that it is authentic. As a threat model, imagine a disinformation ad claiming to have been paid for by nytimes.com. Not great!

Knowing who paid for an ad would be great (for all kinds of reasons) and using attribution as a forcing function is interesting. Ideally, this would somehow be signed but I'm not sure what this would sign that wouldn't allow it to be hijacked just by reusing the sig. I suspect that turning ad creatives into bundles (single-origin, etc. not the current version) and having the hash of that be signed for "who purchased" would work better here.

If your goal is to produce a UI, I wouldn't expect marketers to put anything useful in there. Either it will be generic (works for all campaigns but means nothing), internal (means nothing to people outside that marketing team), or it'll be wrong (was entered in the first campaign and kept as is). From a user's PoV, a screenshot and a capture of whatever accessible alternative is in the creative might work best?

There will be bad actors, for sure. But the result of saying we must not support ad descriptions unless we can enforce their authenticity will likely be users get no ad descriptions. Is zero information beyond "On Wednesday 10/21, you clicked an ad on search.example which took you to shop.example" better than the risk of unauthentic ad descriptions?

Big players in this space will be inclined to get this right and not lie, don't you think? That would likely provide users with accurate and readable ad information for a majority of the ads they see/click. The price would be the risk of bogus or fraudulent data.

csharrison commented 3 years ago

+1 to @johnwilander. I think this would be valuable to have, even without proof of authenticity. Descriptions which are lies are imo a bit benign unless they are actively tricking the user or being abusive, and I do think that some ad networks may want to use this for greater ads transparency. I think this would be a decent topic to bring up at the web advertising BG to see what people think.

michael-oneill commented 3 years ago

The most important information for the user (and for society) is who is responsible for the ad, so there at least should be a variable string for the organisation name. Beyond that a small set of enums (could be combined into one enum) e.g..

enum category { commercial, political, public information, . . } enum targeting { none, contextual, behavioural, . . }

johannhof commented 3 years ago

(Hello, I'll try to engage in conversion/click measurement discussions for Firefox)

I think that giving users more information and transparency around online tracking is a great idea. For ETP we have wanted to add more context to our tracking UI for some time, alas it's hard to retroactively fit onto the web. So I like that we're talking about it here.

I do agree with the concerns that this could lead to let's say mostly low-quality content in the end, some of it maybe even designed to trick users. I'm not sure how much of a threat that is, given that this UI would probably not be part of the primary browser chrome and thus not an attractive target for common spammers. So definitely a concern but not a dealbreaker for me.

I'm not sure if we're completely aligned on what the user should get out of this. Is this information supposed to help them decide whether to cancel the pending attribution? If so, then the most important information is probably who will receive the conversion ping, and details on the ad content(?) seem rather irrelevant. If not, what else is the user supposed to do with the provided information?

johnwilander commented 3 years ago

I'm not sure if we're completely aligned on what the user should get out of this. Is this information supposed to help them decide whether to cancel the pending attribution? If so, then the most important information is probably who will receive the conversion ping, and details on the ad content(?) seem rather irrelevant. If not, what else is the user supposed to do with the provided information?

This'll be an opportunity for browser innovation. You can imagine offering users to review pending ad clicks, canceling attribution reports as you mention, some kind of expert or investigative mode where ad metadata is highlighted in-page, or something like a searchable click history with features like "show me information on all ads I've clicked on for shop.example."

At a high level, PCM adds a platform feature to allow privacy-preserving measurement of clicks that lead to navigations. We have an opportunity to require metadata from content providers where they can try to explain to users how they are using this platform feature.

Similarly, we could also require a description of the conversion event. That might be hard in the legacy pixel mode but easy in the modern JS API mode.

darobin commented 3 years ago

@johnwilander I don't think that it's a huuuuuge threat vector, but what worries me is that the feature seems to be designed with assumptions that might not be forever. If it really is useful, then it'll be tempting to build more on top of it, which in turn will make it more valuable to attack. I wouldn't necessarily characterise today's big actors as particularly incentivised not to lie, either :) There's very little in the way of transparency, and much fraud. A desirable future would also be one in which there are far more small actors, rather than a few big ones, so I wouldn't factor that into the design.

To be clear: I'm not seeing this as necessarily a showstopper, I'm mostly reacting because assuming trust on the web can often backfire, and in adtech doubly-so! I reckon that, if this starts to get misused, we can then decide to invest in a signed alternative.

johnwilander commented 3 years ago

@johnwilander I don't think that it's a huuuuuge threat vector, but what worries me is that the feature seems to be designed with assumptions that might not be forever. If it really is useful, then it'll be tempting to build more on top of it, which in turn will make it more valuable to attack. I wouldn't necessarily characterise today's big actors as particularly incentivised not to lie, either :) There's very little in the way of transparency, and much fraud. A desirable future would also be one in which there are far more small actors, rather than a few big ones, so I wouldn't factor that into the design.

To be clear: I'm not seeing this as necessarily a showstopper, I'm mostly reacting because assuming trust on the web can often backfire, and in adtech doubly-so! I reckon that, if this starts to get misused, we can then decide to invest in a signed alternative.

I appreciate the feedback. At minimum, we should strongly point this out in the spec.

michael-oneill commented 3 years ago

Detecting untruth is an issue throughout society not just the web.. To mitigate against dishonesty humans developed transparency via formalised channels, where legal sanctions could be available. This requirement, with appropriate technical safeguards, could do the same.