privacycg / proposals

New proposals in the Privacy Community Group
https://privacycg.github.io
122 stars 5 forks source link

Registry of Businesses and Domain Name Ownership #11

Closed jackfrankland closed 2 years ago

jackfrankland commented 4 years ago

Introduction

This proposal puts forward the need for a single, or number of Authorities/Registrars that businesses can use to register as an entity that intends to control / process personal information on the web within certain jurisdictions, and the entity's ownership of a domain name.

Goals

Non-goals

Background and arguments

In regards to CCPA and GDPR, storage/access/control of personal information and personally identifiable information, is not limited to domain names, but rather to businesses. When a user visits a site, very commonly they will be presented with a consent banner, which gives the user access to the privacy policy of the business that operates the site under the domain name. The privacy policy may also include the third party service providers with whom the business may share the user's personal information. Currently, the user agent is not able to assist the user in a meaningful way when it comes to the proactive acceptance of a privacy policy.

The user agent is also not able to assist very heavily in retroactive control of data; if the user wishes to view/remove data held by the user agent, they are able to only see a list of domain names that the data is partitioned to. There is a likelihood of little understanding of the businesses that have access to the data when browsing the web, and the relationships between those businesses, where third party service providers are concerned.

In order for the user agent to be able to assist the user, I believe it is necessary that information about the owner of the domain name, and the relationship they have with service providers, is accessible to the user agent. A business could publish the data itself, and have that accessible in a .well_known location in relation to the domain name. Indeed this could be a valid first step in achieving the listed goals. However, when it comes to data protection, I believe it should be assumed that a domain is not fully trusted to keep this information correct by itself.

Businesses already have a large responsibility to fulfill data protection requirements, and depending on the jurisdiction, they are obligated to register themselves to a relevant authority[1]. This responsibility will no doubt get larger and more complex as laws are introduced in more jurisdictions. By implementing standard authorities on the web, it may help to normalise the process/data.

Proposal in slightly more detail

I believe that by having businesses optionally comply to standards, and knowing that access to data can freely be rescinded, user agents have the potential to satisfactorily make decisions on the user's behalf, or prompt the user when necessary. Thus potentially removing the need for consent banners.

To make things more clear, I've put together a mockup to demonstrate how this proposal could open up possibilities for the user agent (I am in no way recommending this is how browsers should decide to implement it šŸ™‚):

User Privacy Dialogues

Considerations

This is a rough start at a proposal, and it's purposefully vague both in definitions and technical specification. If there's interest in it, there are many things to be considered. Some that I can think of off-hand:

How does this compare to similar existing proposals?

First-Party Sets and Domain Boundaries are perhaps similar, in that they offer a mechanism to group domains together under an umbrella, but they serve different goals to this proposal in my opinion. Control and trust over the user's data is the primary goal of this proposal, not the ability for businesses to share client-side data between domains. This proposal puts forward the necessity for an outside authority that the user agent trusts, rather than relying on .well_known locations or DNS records set up by the business. This proposal also provides a means for the registration of relationships to service providers, that the user's information may be shared with, to allow for transparency/control over shared data.

IAB have published a framework to allow publishers to comply with CCPA legislation, by registering and signing an agreement. This proposal doesn't aim to compete with the framework, but it would be interesting to explore if this could perhaps compliment it.

Appendix

[1] Existing business registries:


Including my comment from April 29th here to improve visibility


Thanks a lot for the responses here and in the call. I've put some thought into how this could move along into a more concrete spec for consideration, while hopefully addressing some of the thoughts/concerns made so far.

Straw man spec

Data Structure

Upon request, the User Agent can access the following, per domain, which contains data relevant to the processing or control of personal information by the entity that owns the domain:

{
  "policy": {
    "type": "",
    "version": "",
    "clientStorageRequirement": "",
    "fullPolicyTextHref": "",
  },
  "serviceProviders": [
    {
      "entityId": "",
      "domainName": "",
      "processingCapability": ""
    }
  ],
  "interface": {
    "type": "",
    "version": "",
    "signalConformity": [
      "opt-in",
      "opt-out",
      "do-not-sell"
    ]
  },
  "signed": {
    "domainName: "",
    "entity": {
      "uniqueIdentifier: "",
      "name": "",
      "state": "",
      "country": "",
      "governmentAuthorityRegistrationId": ""
    },
    "expires": ""
  }
  "authority": "",
  "signature": ""
}

Policy

This details policy information that can be read programmatically. The schema is designed to be extensible, and the different types and standards are not part of this scope, other than what's considered the most basic.

Service Providers

This is an important aspect of the proposal, which has the potential to allow greater transparency and feed into decisions the User Agent makes regarding sharing of data. Consent Management Platforms currently create an environment where the user agrees (in my opinion unwittingly) to the sharing of their data to hundreds of third party services. This information is usually in the written privacy policy, but I think there's a great advantage to having this exposed to the User Agent.

Interface

This details how the User Agent is able to communicate with the entity in regards to control of personal data. Again, it is designed to be extensible and contain the standards to which the entity conforms to, with perhaps some flexibility for unique customisation. Applied standards can borrow a lot from learnings elsewhere, including the TCF as mentioned by chrispaterson.

Signed

This is the portion of the data that is required to be signed by an authority. Here, it is the domain name to business/entity relationship.

Where should the data be accessed from?

I think the options are:

  1. Stored on a server that the entity controls, in a .well_known location relative to the domain
  2. Stored as a DNS TXT Record, in a _well_known host relative to the domain

My preference is for a DNS record. It implies a certain amount of elevated priveleges to implement, inherently verifies domain access by the entity, and avoids a potential issue with matching wildcard domains due to upstream proxies.

How can the data be trusted?

An authority will be responsible for signing some of the data. In this straw man spec, only the domain name to entity relationship is signed, the rest of the data is separate. This is to allow for the easy updating of the policy, service providers and interface. The authority should act as a registry for the business entity, and should verify that the signature request is legitimate. Perhaps a similar process to EV certificates could be used for the validation process.

As far as the trusting of the policy goes, it's a difficult one. How would an authority audit the process, and monitor the process over time? Right now, everything is behind a black box to the User Agent, and this proposal is attempting to bring the processes to light. It was mentioned in the call that attempting to standardise these processes can also have the advantage of businesses having a better sense of how they should be handling the data.

Revocation

Some mechanism should be in place for the authority to signal that a record is now invalid, without having to wait for the expiry of the record. This needs more thought.

What can the User Agent do with the data?

Please see the UX mockup in the original post above. An API could be made accessible to JS perhaps for further functionality. To be clear, this proposal does not attempt to define standard behaviour of different browsers, or the API.

One further idea is the concept of the User Agent being in control of both transient and long-lived consent, given either implicitly or explicitly. Going further, there could be an identifier to represent this consent, which the User Agent could use to query the business/entity and its service providers for the existence of personal data associated with this identifier, revealing an audit trail of where and how the data is being used. This, again, is not in scope, but is perhaps made possible by the proposal.

What incentive does a business have to register and keep the policy up to date?

This was raised by a number of people in the call, and by sammacbeth above. My initial thinking was that, in the event that the User Agent were to become more restrictive for domains that do not provide a policy or business ownership, the incentive for the business will be to not be affected as heavily by these constraints. This will be especially true for non-essential third party service providers, where the User Agent may enact more stringent measures i.e. decide not to load them. This is the main contrast between this proposal and First Party Sets in my opinion, as the goal for that proposal seems to be for the User Agent to be able to treat multiple domains as the same site in terms of privacy, effectively lessening existing restrictions. Having said that, as this proposal's scope does not include the behaviour of the User Agent, a similar lessening of existing restrictions would be possible with this proposal I think, and could certainly provide an incentive if that's what the User Agent decided to allow.

High-level Questions

erik-anderson commented 4 years ago

@jackfrankland would you be able to give an overview of this proposal during next week's call? If so, we can add the agenda+ label. Thanks!

SebastianZimmeck commented 4 years ago

I find this is a very interesting proposal! A few thoughts off the top of my head:

jackfrankland commented 4 years ago

Thanks a lot for your thoughts.

To answer your first question, I didn't put too much thought into the wording or existence of the "View policy" button to be honest, I just wanted to show it as an example of a user action that a browser could implement if they chose to. Having said that, I saw it more inline with existing "View certificate" functionality, rather than being taken to a privacy policy on a website. There could be potential for there to be standard policy definitions that are detailed as part of the registration. This could include a list of third party service providers, with references to their business registrations. The browser can use this list to verify requests that go out to third party domains - though it would be up to the browser to decide what to do in the event that there is a mismatch - perhaps it could be lenient if the third party business registration complies to a certain standard.

Agree that the second two questions are important ones šŸ™‚. I'll add the Vermont information to the list if you don't mind. Perhaps, if browsers were to begin implementing detrimental features to non-registered businesses, this would be the incentive for much higher compliance. A more integrated platform would remove a lot of barriers.

jackfrankland commented 4 years ago

@jackfrankland would you be able to give an overview of this proposal during next week's call? If so, we can add the agenda+ label. Thanks!

Thanks @erik-anderson, very happy to give an overview next week, if you think there'll be enough interest.

erik-anderson commented 4 years ago

@jackfrankland given it's a new proposal, it would make sense to give an overview to encourage folks who may be interested to take a closer look. I'll add agenda+. Thanks!

a2intl commented 4 years ago

This needs a mechanism (probably rooted in an existing web standard). It's a good concept, but without a concrete implementation suggested (and reasoned-for) it's just an idea/need/requirement, not a proposal.

chrispaterson commented 4 years ago

This proposal is very similar in abstract to the IAB's Transparency And Consent Framework. The TCF:

I wonder if there could be some synergy here?

sammacbeth commented 4 years ago

This proposal is interesting, as Cliqz and Ghostery have been working around this space and, since the GDPR came into force, been trying to enable the browser to help users navigate the complex consent popups they are presented with. We also maintain a database of mappings from domains to entities and companies that is used, for example, in Ghostery to show the companies behind the third-parties on a page.

Firstly, a registry of domain name ownership already exists: the WHOIS database. However, nowadays this is of little use for ascertaining the owners of a domain, as the majority use WHOIS anonymisation services. The volume of domains that do not have correct or transparent information in WHOIS (even those with large companies behind them driving significant web traffic), suggests that just creating a parallel registry will not be effective - unless there is a strong incentive to keep this information updated and correct.

Secondly, I believe would be within scope for this group to help standardise consent banners/popups (which you mention in the proposal goals), to help reduce the friction users experience with these.

As has been mentioned in this thread, the IAB Tranparency and Consent Framework is an industry attempt to standardise the expression of consent. At Cliqz we developed a prototype to allow user reading and then overwriting consent for sites using this framework, however there are some fundamental issues with the framework that prevent this being a practical approach:

As with business registration, the key issue here is incentivisation. The current approach to acquiring consent on the web is highly biased towards the site owners - there have already been several studies showing how dark patterns are being employed to achieve higher opt-in rates, and the current adtech market rewards higher opt-in rates with higher revenues. Thus a standardisation attempt must address the balance of power between users and site owners, but this is in-turn unlikely to get sites to adopt the standard.

chrispaterson commented 4 years ago

@sammacbeth thanks for the clear outlaying of your thoughts. To reveal my cards a bit here, I created write and maintain the standardized libraries for the TCF and am in the "commit group" for the TCF; your three bullet points are definitely a hinderance to this becoming a 'practical' approach as the TCF is implemented today. But, the TCF is implemented the way it is today largely to get around the restrictions that the browser creates. I believe we are all acutely aware that if a browser-based standard emerges that it will be superior and more reliable than the TCF that relies on cookies and site owners' implementations of a Consent Management Platform (CMP) (or colloquially "Cookie Banners"). The Ad Tech ecosystem is eagerly working toward a solution to provide transparency and give users the ability consent to companies and personal data processing purposes; as the namesake project implies.

I really think the TCString could be a great starting point for a signaling mechanism and a UA could easily create that string (the specification and code is open-sourced). Also a UA could create the JavaScript API for Ad Tech scripts to call to gather user preferences to pass over RTB channels ā€“ There may be better mechanisms, but I'm just throwing some ideas out there.

I would be happy to engage with this group to explore the idea.

jackfrankland commented 4 years ago

Thanks a lot for the responses here and in the call. I've put some thought into how this could move along into a more concrete spec for consideration, while hopefully addressing some of the thoughts/concerns made so far.

Straw man spec

Data Structure

Upon request, the User Agent can access the following, per domain, which contains data relevant to the processing or control of personal information by the entity that owns the domain:

{
  "policy": {
    "type": "",
    "version": "",
    "clientStorageRequirement": "",
    "fullPolicyTextHref": "",
  },
  "serviceProviders": [
    {
      "entityId": "",
      "domainName": "",
      "processingCapability": ""
    }
  ],
  "interface": {
    "type": "",
    "version": "",
    "signalConformity": [
      "opt-in",
      "opt-out",
      "do-not-sell"
    ]
  },
  "signed": {
    "domainName: "",
    "entity": {
      "uniqueIdentifier: "",
      "name": "",
      "state": "",
      "country": "",
      "governmentAuthorityRegistrationId": ""
    },
    "expires": ""
  }
  "authority": "",
  "signature": ""
}

Policy

This details policy information that can be read programmatically. The schema is designed to be extensible, and the different types and standards are not part of this scope, other than what's considered the most basic.

Service Providers

This is an important aspect of the proposal, which has the potential to allow greater transparency and feed into decisions the User Agent makes regarding sharing of data. Consent Management Platforms currently create an environment where the user agrees (in my opinion unwittingly) to the sharing of their data to hundreds of third party services. This information is usually in the written privacy policy, but I think there's a great advantage to having this exposed to the User Agent.

Interface

This details how the User Agent is able to communicate with the entity in regards to control of personal data. Again, it is designed to be extensible and contain the standards to which the entity conforms to, with perhaps some flexibility for unique customisation. Applied standards can borrow a lot from learnings elsewhere, including the TCF as mentioned by @chrispaterson.

Signed

This is the portion of the data that is required to be signed by an authority. Here, it is the domain name to business/entity relationship.

Where should the data be accessed from?

I think the options are:

  1. Stored on a server that the entity controls, in a .well_known location relative to the domain
  2. Stored as a DNS TXT Record, in a _well_known host relative to the domain

My preference is for a DNS record. It implies a certain amount of elevated priveleges to implement, inherently verifies domain access by the entity, and avoids a potential issue with matching wildcard domains due to upstream proxies.

How can the data be trusted?

An authority will be responsible for signing some of the data. In this straw man spec, only the domain name to entity relationship is signed, the rest of the data is separate. This is to allow for the easy updating of the policy, service providers and interface. The authority should act as a registry for the business entity, and should verify that the signature request is legitimate. Perhaps a similar process to EV certificates could be used for the validation process.

As far as the trusting of the policy goes, it's a difficult one. How would an authority audit the process, and monitor the process over time? Right now, everything is behind a black box to the User Agent, and this proposal is attempting to bring the processes to light. It was mentioned in the call that attempting to standardise these processes can also have the advantage of businesses having a better sense of how they should be handling the data.

Revocation

Some mechanism should be in place for the authority to signal that a record is now invalid, without having to wait for the expiry of the record. This needs more thought.

What can the User Agent do with the data?

Please see the UX mockup in the original post above. An API could be made accessible to JS perhaps for further functionality. To be clear, this proposal does not attempt to define standard behaviour of different browsers, or the API.

One further idea is the concept of the User Agent being in control of both transient and long-lived consent, given either implicitly or explicitly. Going further, there could be an identifier to represent this consent, which the User Agent could use to query the business/entity and its service providers for the existence of personal data associated with this identifier, revealing an audit trail of where and how the data is being used. This, again, is not in scope, but is perhaps made possible by the proposal.

What incentive does a business have to register and keep the policy up to date?

This was raised by a number of people in the call, and by @sammacbeth above. My initial thinking was that, in the event that the User Agent were to become more restrictive for domains that do not provide a policy or business ownership, the incentive for the business will be to not be affected as heavily by these constraints. This will be especially true for non-essential third party service providers, where the User Agent may enact more stringent measures i.e. decide not to load them. This is the main contrast between this proposal and First Party Sets in my opinion, as the goal for that proposal seems to be for the User Agent to be able to treat multiple domains as the same site in terms of privacy, effectively lessening existing restrictions. Having said that, as this proposal's scope does not include the behaviour of the User Agent, a similar lessening of existing restrictions would be possible with this proposal I think, and could certainly provide an incentive if that's what the User Agent decided to allow.

High-level Questions

dmdabbs commented 4 years ago

Following on from @a2intl's comment about a web standard, The IETF's replacement for WHOIS, RDAP, https://www.icann.org/rdap, might be extended to accommodate our use case, assuming its compatible with use, but there's no guarantee all registrars would support extensions we might require. There may be goodness to explore in that spec, given that its focused on domain information.

krgovind commented 4 years ago

DNS TXT records can be unreliable with the way (insecure) DNS resolution currently works; where un-updateable WiFi routers often serve as DNS resolvers, and middleboxes are known to tamper with / drop records. Past experimentation in Chrome has shown that 4-5% of users are unable to look up DNS TXT records.

I think any reliance on DNS as a delivery mechanism would have to be predicated on the prevalence of a technology like DNS-over-HTTPS, which (a) prevents middleboxes from dropping records, and (b) may require via policy enforcement that servers faithfully pass along TXT records.

jackfrankland commented 4 years ago

Thanks @krgovind. In that case I definitely agree that a hosted .well-known location would be better.

samuelweiler commented 3 years ago

Having this be (only) a pointer from the domain name back to the business registration/identity seems like it would be subject to easy spoofing and be impossible to meet the audit criteria you describe as:

At any time, a business can query the domain names that have been registered under its ownership, and flag any issues, with sufficient verification that they have the power to do so.

Perhaps you (also, or perhaps even instead) need the business registration pointing back at the list of domain names the business is accepting responsibility for. Business-registering entities (e.g. Delaware's Division of Corporations) would be the most appropriate entities to handle that registration and attestation. Unfortunately, those are a diverse lot. And that speaks to a related problem: what happens when different parties (different companies) register things with the same name in different jurisdictions? We handle marketplace confusion with trademark law, but that's probably too clumsy and slow-moving of a tool to use for automated decisions.

jackfrankland commented 3 years ago

Having this be (only) a pointer from the domain name back to the business registration/identity seems like it would be subject to easy spoofing and be impossible to meet the audit criteria you describe.

Perhaps you (also, or perhaps even instead) need the business registration pointing back at the list of domain names the business is accepting responsibility for.

The business should be able to query the authority for the domain names currently registered as being owned by it, for it to verify the list at any stage, and not the user agent. I was seeing this as private, but there could indeed be a separate public facing API that conformed to a standard, hosted by the authority, if there was a benefit to that.

Business-registering entities (e.g. Delaware's Division of Corporations) would be the most appropriate entities to handle that registration and attestation. Unfortunately, those are a diverse lot. And that speaks to a related problem: what happens when different parties (different companies) register things with the same name in different jurisdictions? We handle marketplace confusion with trademark law, but that's probably too clumsy and slow-moving of a tool to use for automated decisions.

I agree that it makes sense for existing business-registering entities to act as a registry and signing authority for businesses ownership of domain names. There could also be other authorities which may or may not have an affiliation with government registers. Ultimately it will be the user agent that decides whether an authority can be trusted, in a similar way to certificate authorities, and so I think there's room for the diversity. I appreciate that some (or a lot of) extra thought would need to go into this though.

It's my opinion that standardised publication of the details (including the jurisdiction its registered in) of the actual business behind a domain name can only be an improvement over what currently exists, which is a non-programmatically readable privacy notice that can much more easily be false. Two businesses with the same name but different jurisdictions will be treated as different by the user agent, which will reference them using a unique identifier.

hober commented 2 years ago

hi @jackfrankland! is this still something you're interested in pursuing?

jackfrankland commented 2 years ago

Thanks, and sorry for the delay. Closing the issue now, great to be able to have had the discussion. šŸ™‚