Registry of Businesses and Domain Name Ownership

jackfrankland commented 4 years ago

Introduction

This proposal puts forward the need for a single, or number of Authorities/Registrars that businesses can use to register as an entity that intends to control / process personal information on the web within certain jurisdictions, and the entity's ownership of a domain name.

Goals

Provide a more transparent means for the user to govern when and how their personal information is used/stored/shared by businesses.
Provide a means for user agents to decide default behaviour in regards to allowing data to be accessed/stored/shared per domain.
Provide a means for businesses to register their domain names for the business as an entity that intends to control / process personal information within certain jurisdictions.
Provide a means for businesses to register their relationship with service providers, to the extent that a service provider is a separate business, and intends to process shared personal information.
Aid in the decline of consent banners

Non-goals

This proposal does not attempt to define the protocol of signals between the user agent and business (e.g. Do Not Sell). Rather, it could help to define the relationship that may be necessary in order for the communication to exist.
The ability for client-side data to be shared across domains that have the same business registration, although this could perhaps follow.

Background and arguments

In regards to CCPA and GDPR, storage/access/control of personal information and personally identifiable information, is not limited to domain names, but rather to businesses. When a user visits a site, very commonly they will be presented with a consent banner, which gives the user access to the privacy policy of the business that operates the site under the domain name. The privacy policy may also include the third party service providers with whom the business may share the user's personal information. Currently, the user agent is not able to assist the user in a meaningful way when it comes to the proactive acceptance of a privacy policy.

The user agent is also not able to assist very heavily in retroactive control of data; if the user wishes to view/remove data held by the user agent, they are able to only see a list of domain names that the data is partitioned to. There is a likelihood of little understanding of the businesses that have access to the data when browsing the web, and the relationships between those businesses, where third party service providers are concerned.

In order for the user agent to be able to assist the user, I believe it is necessary that information about the owner of the domain name, and the relationship they have with service providers, is accessible to the user agent. A business could publish the data itself, and have that accessible in a .well_known location in relation to the domain name. Indeed this could be a valid first step in achieving the listed goals. However, when it comes to data protection, I believe it should be assumed that a domain is not fully trusted to keep this information correct by itself.

Businesses already have a large responsibility to fulfill data protection requirements, and depending on the jurisdiction, they are obligated to register themselves to a relevant authority^[1]. This responsibility will no doubt get larger and more complex as laws are introduced in more jurisdictions. By implementing standard authorities on the web, it may help to normalise the process/data.

Proposal in slightly more detail

At the very least, a business registration should include the name of the business, and the information required to communicate with the business for data protection purposes. If a registration for a domain name is being made for a business, there should be a mechanism that acts as sufficient verification that it is the business or an agent of the business performing the registration. At any time, a business can query the domain names that have been registered under its ownership, and flag any issues, with sufficient verification that they have the power to do so.
Upon request, most likely at the time of domain name resolution, the user agent is given, or can query, the business registration for the domain name. In order to be deemed valid, the registration must be signed by an authority that the user agent has trusted. The registration may be cached at a number of locations between the user agent and the authority.
The user agent can use the existence and data of a business registration, and the preferences set by the user to determine if and how it allows the access/storage of client-side data for the domain by default, and the domains of the business's service providers.
Client-side user data can be partitioned to a business registration (or the absence of a business registration) under a domain. If the business registration meaningfully changes, the client-side data can undergo a process of transferral, or removal, controlled by the user agent.
For user data stored by the business (i.e. non-client side), the user agent can send signals to the business and its service providers, requesting it to perform certain actions i.e. Opt out, Do Not Sell. This is providing that the business has agreed to comply to a certain standard/protocol, and the registration contains details on the protocol of communication. If the business does not show itself to comply to a certain standard, the user agent will have had the opportunity to deny access to data / prompt the user upfront.

I believe that by having businesses optionally comply to standards, and knowing that access to data can freely be rescinded, user agents have the potential to satisfactorily make decisions on the user's behalf, or prompt the user when necessary. Thus potentially removing the need for consent banners.

To make things more clear, I've put together a mockup to demonstrate how this proposal could open up possibilities for the user agent (I am in no way recommending this is how browsers should decide to implement it 🙂):

User Privacy Dialogues

Considerations

This is a rough start at a proposal, and it's purposefully vague both in definitions and technical specification. If there's interest in it, there are many things to be considered. Some that I can think of off-hand:

What would the exact definition of a business and domain name be, for the purpose of this proposal?
What would a business registration look like? What data would it hold?
What constitutes a sufficient verification of a business?
Would there be multiple business registrations allowed for a single domain, for the same business/organisation, to cater for different jurisdictions/laws?
How about domain names that many different businesses may use?
How can a business registration link/relate to registrations in existing authorities^[1]?
What existing mechanisms if any could be used that this registry could piggyback off of? (domain name registrars, certificate authorities etc.).

How does this compare to similar existing proposals?

First-Party Sets and Domain Boundaries are perhaps similar, in that they offer a mechanism to group domains together under an umbrella, but they serve different goals to this proposal in my opinion. Control and trust over the user's data is the primary goal of this proposal, not the ability for businesses to share client-side data between domains. This proposal puts forward the necessity for an outside authority that the user agent trusts, rather than relying on .well_known locations or DNS records set up by the business. This proposal also provides a means for the registration of relationships to service providers, that the user's information may be shared with, to allow for transparency/control over shared data.

IAB have published a framework to allow publishers to comply with CCPA legislation, by registering and signing an agreement. This proposal doesn't aim to compete with the framework, but it would be interesting to explore if this could perhaps compliment it.

Appendix

^[1] Existing business registries:

UK ICO - mandatory registration
GDPR member state Data Protection Authorities - non-mandatory registration depending on member state
California Data Broker Registry - mandatory for those that fall under the definition of Data Broker

Including my comment from April 29th here to improve visibility

Thanks a lot for the responses here and in the call. I've put some thought into how this could move along into a more concrete spec for consideration, while hopefully addressing some of the thoughts/concerns made so far.

Straw man spec

Data Structure

Upon request, the User Agent can access the following, per domain, which contains data relevant to the processing or control of personal information by the entity that owns the domain:

{
  "policy": {
    "type": "",
    "version": "",
    "clientStorageRequirement": "",
    "fullPolicyTextHref": "",
  },
  "serviceProviders": [
    {
      "entityId": "",
      "domainName": "",
      "processingCapability": ""
    }
  ],
  "interface": {
    "type": "",
    "version": "",
    "signalConformity": [
      "opt-in",
      "opt-out",
      "do-not-sell"
    ]
  },
  "signed": {
    "domainName: "",
    "entity": {
      "uniqueIdentifier: "",
      "name": "",
      "state": "",
      "country": "",
      "governmentAuthorityRegistrationId": ""
    },
    "expires": ""
  }
  "authority": "",
  "signature": ""
}

Policy

This details policy information that can be read programmatically. The schema is designed to be extensible, and the different types and standards are not part of this scope, other than what's considered the most basic.

Service Providers

This is an important aspect of the proposal, which has the potential to allow greater transparency and feed into decisions the User Agent makes regarding sharing of data. Consent Management Platforms currently create an environment where the user agrees (in my opinion unwittingly) to the sharing of their data to hundreds of third party services. This information is usually in the written privacy policy, but I think there's a great advantage to having this exposed to the User Agent.

Interface

This details how the User Agent is able to communicate with the entity in regards to control of personal data. Again, it is designed to be extensible and contain the standards to which the entity conforms to, with perhaps some flexibility for unique customisation. Applied standards can borrow a lot from learnings elsewhere, including the TCF as mentioned by chrispaterson.

Signed

This is the portion of the data that is required to be signed by an authority. Here, it is the domain name to business/entity relationship.

Where should the data be accessed from?

I think the options are:

Stored on a server that the entity controls, in a .well_known location relative to the domain
Stored as a DNS TXT Record, in a _well_known host relative to the domain

My preference is for a DNS record. It implies a certain amount of elevated priveleges to implement, inherently verifies domain access by the entity, and avoids a potential issue with matching wildcard domains due to upstream proxies.

How can the data be trusted?

An authority will be responsible for signing some of the data. In this straw man spec, only the domain name to entity relationship is signed, the rest of the data is separate. This is to allow for the easy updating of the policy, service providers and interface. The authority should act as a registry for the business entity, and should verify that the signature request is legitimate. Perhaps a similar process to EV certificates could be used for the validation process.

As far as the trusting of the policy goes, it's a difficult one. How would an authority audit the process, and monitor the process over time? Right now, everything is behind a black box to the User Agent, and this proposal is attempting to bring the processes to light. It was mentioned in the call that attempting to standardise these processes can also have the advantage of businesses having a better sense of how they should be handling the data.

Revocation

Some mechanism should be in place for the authority to signal that a record is now invalid, without having to wait for the expiry of the record. This needs more thought.

What can the User Agent do with the data?

Please see the UX mockup in the original post above. An API could be made accessible to JS perhaps for further functionality. To be clear, this proposal does not attempt to define standard behaviour of different browsers, or the API.

One further idea is the concept of the User Agent being in control of both transient and long-lived consent, given either implicitly or explicitly. Going further, there could be an identifier to represent this consent, which the User Agent could use to query the business/entity and its service providers for the existence of personal data associated with this identifier, revealing an audit trail of where and how the data is being used. This, again, is not in scope, but is perhaps made possible by the proposal.

What incentive does a business have to register and keep the policy up to date?

This was raised by a number of people in the call, and by sammacbeth above. My initial thinking was that, in the event that the User Agent were to become more restrictive for domains that do not provide a policy or business ownership, the incentive for the business will be to not be affected as heavily by these constraints. This will be especially true for non-essential third party service providers, where the User Agent may enact more stringent measures i.e. decide not to load them. This is the main contrast between this proposal and First Party Sets in my opinion, as the goal for that proposal seems to be for the User Agent to be able to treat multiple domains as the same site in terms of privacy, effectively lessening existing restrictions. Having said that, as this proposal's scope does not include the behaviour of the User Agent, a similar lessening of existing restrictions would be possible with this proposal I think, and could certainly provide an incentive if that's what the User Agent decided to allow.

High-level Questions

Is the possible User Agent behaviour valuable? Is this something that would garner interest?
If yes, does the presence of this data fulfil the desired behaviour? Are there other ways to achieve the same goal?
Is there value in the data even if it isn't signed? Can the hard dependency on an authority be removed?

erik-anderson commented 4 years ago

@jackfrankland would you be able to give an overview of this proposal during next week's call? If so, we can add the agenda+ label. Thanks!

SebastianZimmeck commented 4 years ago

I find this is a very interesting proposal! A few thoughts off the top of my head:

What would be the relationship of a business' registration to its privacy policy? Would the link in the first picture above go to the privacy policy? Generally, it is difficult to keep the policy in sync with what a website or program is doing in actuality. Maybe, the browser could check the network requests to see which third parties are integrated in a site and keep those in sync with both the registration and policy.
How does the proposed registration requirement relate to the existing registration requirements (adding to the list above the Vermont data broker law, which, apparently, sees low compliance)? Maybe, they can be all integrated into one platform. Otherwise, it would be burdensome for the businesses to keep up with multiple registration authorities and keeping them in sync.
Related to the previous question, who could be the registrar? A governmental agency, a non-profit organization, a for-profit company, ... ?

jackfrankland commented 4 years ago

Thanks a lot for your thoughts.

To answer your first question, I didn't put too much thought into the wording or existence of the "View policy" button to be honest, I just wanted to show it as an example of a user action that a browser could implement if they chose to. Having said that, I saw it more inline with existing "View certificate" functionality, rather than being taken to a privacy policy on a website. There could be potential for there to be standard policy definitions that are detailed as part of the registration. This could include a list of third party service providers, with references to their business registrations. The browser can use this list to verify requests that go out to third party domains - though it would be up to the browser to decide what to do in the event that there is a mismatch - perhaps it could be lenient if the third party business registration complies to a certain standard.

Agree that the second two questions are important ones 🙂. I'll add the Vermont information to the list if you don't mind. Perhaps, if browsers were to begin implementing detrimental features to non-registered businesses, this would be the incentive for much higher compliance. A more integrated platform would remove a lot of barriers.

jackfrankland commented 4 years ago

@jackfrankland would you be able to give an overview of this proposal during next week's call? If so, we can add the agenda+ label. Thanks!

Thanks @erik-anderson, very happy to give an overview next week, if you think there'll be enough interest.

erik-anderson commented 4 years ago

@jackfrankland given it's a new proposal, it would make sense to give an overview to encourage folks who may be interested to take a closer look. I'll add agenda+. Thanks!

a2intl commented 4 years ago

This needs a mechanism (probably rooted in an existing web standard). It's a good concept, but without a concrete implementation suggested (and reasoned-for) it's just an idea/need/requirement, not a proposal.

chrispaterson commented 4 years ago

This proposal is very similar in abstract to the IAB's Transparency And Consent Framework. The TCF:

Provides users with a transparent way to granularly enable users to select which business (vendor) may process their personal data and for what purpose.
Defines a set of standard processing purposes.
Creates a global registry of businesses and their processing purposes called the Global Vendor List.
Defines a Base64Url encoded bitfield of signals that are passed along to businesses to clearly signal the user's preferences. (which is already implemented and in use by Vendors now)

I wonder if there could be some synergy here?

sammacbeth commented 4 years ago

This proposal is interesting, as Cliqz and Ghostery have been working around this space and, since the GDPR came into force, been trying to enable the browser to help users navigate the complex consent popups they are presented with. We also maintain a database of mappings from domains to entities and companies that is used, for example, in Ghostery to show the companies behind the third-parties on a page.

Firstly, a registry of domain name ownership already exists: the WHOIS database. However, nowadays this is of little use for ascertaining the owners of a domain, as the majority use WHOIS anonymisation services. The volume of domains that do not have correct or transparent information in WHOIS (even those with large companies behind them driving significant web traffic), suggests that just creating a parallel registry will not be effective - unless there is a strong incentive to keep this information updated and correct.

Secondly, I believe would be within scope for this group to help standardise consent banners/popups (which you mention in the proposal goals), to help reduce the friction users experience with these.

As has been mentioned in this thread, the IAB Tranparency and Consent Framework is an industry attempt to standardise the expression of consent. At Cliqz we developed a prototype to allow user reading and then overwriting consent for sites using this framework, however there are some fundamental issues with the framework that prevent this being a practical approach:

The API is read-only. The logic to update and store consent is hidden in proprietary implementations. This means that programmatically changing the consent string is error prone.
The API can only be used after the user has expressed their initial consent in a popup/banner. This means that it is not possible to use this API to improve on the primary user pain point.
The consent categories are not understand by users, and are very specific to adtech.

As with business registration, the key issue here is incentivisation. The current approach to acquiring consent on the web is highly biased towards the site owners - there have already been several studies showing how dark patterns are being employed to achieve higher opt-in rates, and the current adtech market rewards higher opt-in rates with higher revenues. Thus a standardisation attempt must address the balance of power between users and site owners, but this is in-turn unlikely to get sites to adopt the standard.

chrispaterson commented 4 years ago

@sammacbeth thanks for the clear outlaying of your thoughts. To reveal my cards a bit here, I created write and maintain the standardized libraries for the TCF and am in the "commit group" for the TCF; your three bullet points are definitely a hinderance to this becoming a 'practical' approach as the TCF is implemented today. But, the TCF is implemented the way it is today largely to get around the restrictions that the browser creates. I believe we are all acutely aware that if a browser-based standard emerges that it will be superior and more reliable than the TCF that relies on cookies and site owners' implementations of a Consent Management Platform (CMP) (or colloquially "Cookie Banners"). The Ad Tech ecosystem is eagerly working toward a solution to provide transparency and give users the ability consent to companies and personal data processing purposes; as the namesake project implies.

I really think the TCString could be a great starting point for a signaling mechanism and a UA could easily create that string (the specification and code is open-sourced). Also a UA could create the JavaScript API for Ad Tech scripts to call to gather user preferences to pass over RTB channels – There may be better mechanisms, but I'm just throwing some ideas out there.

I would be happy to engage with this group to explore the idea.

jackfrankland commented 4 years ago

Thanks a lot for the responses here and in the call. I've put some thought into how this could move along into a more concrete spec for consideration, while hopefully addressing some of the thoughts/concerns made so far.