Academic Identity Federations

hlflanagan commented 4 years ago

Research collaborations and academic institutions around the world rely on multilateral identity federation to enable single-sign-on and authorization on the web as well as via dedicated applications and domain-specific tools. Here are three typical examples:

Go to https://nature.com and follow a discovery UX flow to find your institution’s identity provider (IdP) and proceed to login. Attributes will be shared (via a federation protocol - typically SAML or OIDC) which will be used to determine if you and/or your institution have authorization to access specific resources.
Go to a shared wiki space (e.g., https://wiki.refeds.org) and follow an IdP discovery interface to find your institution and login.
Go to a shared certificate service (e.g., https://wiki.geant.org/display/TCSNT/TCS+Participants+Sectigo) and follow an IdP discovery interface to find your institution and login. Attributes will be shared to determine if you are authorized to generate and sign certificates on behalf of your organization.

The problem?

Low-level primitives in the browser that are being optimized for considerations of personal privacy as a result of widespread misuse of cross-site tracking, namely third party cookies, link decoration, and postMessage, are commonly used to simplify the user experience in identity provider discovery flows. Identity provider discovery flows as deployed in R&E federations are inherently privacy-preserving.

Can First Party Sets help?

FPS is a formulation to allow user identity to span related origins, where consistent with privacy requirements. Identity federation, however, is not just about a single IdP supporting a set of RPs; it is about a trust fabric that allows collaboration between multiple IdPs and RPs.

In the academic federation model, the user is going to traverse multiple origins in order to access resources. In the US, the InCommon federation has nearly 4000 IdPs and 6000 RPs. A user that visits the RP will have to go through a discovery flow to find their IdP. Ideally, they will use that discovery flow once and have their choice reflected during visits to any of the other 5999 RPs.

For the browser to correctly intermediate the identity flows, it must understand the underlying federation of trust between all these parties to ensure the correct user experience. Today, FPS does not describe these kinds of relationships.

Academic Federation Policies

Academic federations create a multi-lateral trust fabric between thousands of IdPs and RPs, heavily dependent on both SAML and OAuth protocols. Policies are multilayered, happening at the federation, IdP, and RP levels. No one group has control over user browser configuration or use.

Academic federations are different from consumer SSO in several key ways:

In academic federations, as is the case in many other regulated industries where federation is deployed as a mechanism for controlled information sharing, global identifiers are often both mandated by law as well as expected by users (eg for receiving citation credits in publishing scenarios).
Users act as employees while outside the context of their immediate organization and its official services. An employee of university X is using services from research collaboration Y which is part of an international consortium where use of resources (and hence identification) is not subject to individual needs and wishes but rather the purview of legislation and international treaties.

WebID thoughts

WebID is a formulation to preserve federation under tighter privacy controls. However:

Observation w3c-fedid/FedCM#1: Academic identity federations have a different set of privacy expectations compared to both consumer and enterprise identity federations. Observation w3c-fedid/FedCM#2: WebID’s design largely depends on a deployment structure that isn’t exactly applicable to academic identity federations: there are thousands of IdPs interacting with thousands of RPs across multiple protocols

Consent Arguments

In some situations, the institution hosting the IdP controls the consent for information release; RPs may not directly ask the user for information. For instance, GDPR section 6 lists the set of conditions under which personal information may be processed. One of those is “free and informed consent”. In most cases where the data subject is acting outside the purely personal sphere (as in consumer identity), consent cannot be freely given. For instance, if a grad student is tasked to write a piece of code and the official policy is to use GitHub then the GitHub consent screen is arguably illegal in the EU since the user is not able to deny consent. Similar situations arise in all federation use-cases and it is for this reason very uncommon for R&E services in the EU to rely on consent as a legal foundation for processing PII.

leifj commented 4 years ago

Its worth noting that the information persisted across multiple ORIGINs during most current implementations of the IdP discovery flow is not PII - eg an email address or a personal identifier, but is typically the public identifier of the IdP.

cbiesinger commented 7 months ago

As far I can tell, the proposed IDP registration API in w3c-fedid/idp-registration#2 will address a lot of this. But I'll keep this open so we can verify that it will fix the use cases here.

w3c-fedid / FedCM