solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
490 stars 45 forks source link

Specifying a registration endpoint for the Pod Provider #152

Open NSeydoux opened 4 years ago

NSeydoux commented 4 years ago

Motivation and overview

We need a mechanism for users to create an account for an identity and an account for a Pod, potentially with an identity provider that is different from the pod provider. Hence, we propose the introduction of a uniform registration endpoint.

There is currently nothing in the spec to enforce a uniform registration endpoint exposed by Pod Servers (PS). In order to ensure a consistent user experience in the ecosystem, it would be beneficial that any Identity Provider (IdP) can provision an account for a user on any PS in a uniform manner.

To be specific, the registration process is the process through which a user goes when creating a new identity, which typically means roughly following the following interaction diagram (in the current ecosystem):

Authorization endpoint interaction(1)

The end goal of this process is for the user to be granted with (details about all this in the following sections):

This issue aims at fostering discussion around this process. It specifically aims at onboarding new, non-technical users, which is why the assumption is made that the user does not have a prior WebID, and cannot be required to take any steps to manually edit and/or publish an RDF document on the web.

Depiction of the current process

This description details what happens in steps A, B and C of the interaction diagram in a current implementation of a Pod Provider not embedding its own IdP.

Current assumptions

The trust relationship from the IdP to the PS is I more of an implementation detail coming from the original lack of separation between IdP and PS in NSS. The trust from the PS to the IdP is introduced as a security mechanism, to prevent flooding the PS with registration requests. Any request on the register endpoint which is not authenticated with a token identifying a Registrar user recognized by the PS rejected as '401 Unauthorized'.

A - Registration request

POST request with Content-type: application/x-www-form-urlencoded, the form containing username: <myUsername>, and authenticating thanks to a bearer token Authorization header. The token is a JWT with a WebId as a subject claim. This WebId identifies the IdP, and is authorized by the PS to register a new user. Currently, there is an assumption that a trust relationship exists between the IdP and the PS, materialized by a Registrar Agent (i.e. an admin user) recognized by both. The IdP can create an Authorization token for this Registrar Agent, and it is the only credential that is allowed to call the registration endpoint.

This request triggers the next step.

B - Registration process from the PS

At the end of the registration process, a status code 200 OK is returned.

C - Giving access to the “registering” app

This step aims at ensuring the app which initially redirected the user towards the IdP has access to the created resources. Currently, it is achieved through a hard-coded PATCH request adding a SDK generated app to the trusted apps. This request is performed with an Authorization token identifying the admin recognized by both the IdP and the PS (i.e. the Registrar Agent). This is useful for demonstration purposes, but should not be part of the spec.

There is ongoing work to find a replacement to the trusted app mechanism in a dedicated panel, and clarity on this track will be required before pursuing this part of the registration process further.

Elements up for discussion

These are the elements identified in the previous paragraph that should be made explicit by the spec.

Mitigating the identified assumptions

A - Registration request

Instead of the IdP POSTing a form to the PS, would it be possible/desirable to redirect the user to a registration page managed by the PS ? This could add flexibility to the registration process, as different PS may expect different information when creating a user account, and therefore making a spec decision about the form that should be POSTed will constrain implementation choices. However, I'm not sure of the consequences regarding security/trust.

B - Registration process

Most of the details for this will be left to implementation, at least initially.

C - Giving access to the “registering” app

Appendix

This is the markup for the sequence diagram:

participant User
participant App
participant IdP
participant PS

note over User, App: The user is not authenticated and is not registered to the IdP.
User -> App: Tries to log in
App -> User: Redirects/Pop-up (solid-auth-client)
note over User, App: The user interacts with the IdP to create an account.
User -> IdP: Starts creating an account
IdP -> User: Redirects to the account creation page
User -> IdP: Fills in the account creation form
note over IdP, PS: The IdP interacts with the PS authenticated as the Registrar Agent.
IdP -> PS: Registration request (A)
activate PS
note left of PS: Performs registration\n (B)
PS -> IdP: 200 OK
deactivate PS
IdP -> PS: Gives access to the "registering" app (C)
PS -> IdP: 200 OK
IdP -> App: Redirects (HTTP 302) to the created profile document
note over User, PS: The user now interacts with the App, which makes requests to the PS with the user Auth token supplied by the IdP.
User -> App: Clicks 'n stuff
App -> PS: Authenticated access
PS -> App: Authorized resources
App -> User: App-related enjoyment
kjetilk commented 4 years ago

Interesting proposal! At the most basic level, most intelligence in a Solid system sits on the client side, and for the most part, server-side APIs are therefore something we want to be restrictive about. In this case, I think server-side APIs are clearly warranted.

I think we should nevertheless look into how it fits better with the overall idea of Solid, in particular, rather than posting a form, it is really some data that is appended to a container, which through side effects ends up creating a user. That data should be LD, and the container should have many similar properties with other containers in Solid.

There have been some user stories floating around on registrations, I found one that has been taken down, and I created a User Lifecycle label to group them.

Actually, I think it makes sense to add more registration user stories there first, as the space is quite open. I have some ideas that I'll add.

elf-pavlik commented 4 years ago

I think related brief conversation exists in https://github.com/solid/solid-spec/issues/138

Also one note, we currently have no requirement on having user's WebID Profile hosted in Solid Storage (I tend to avoid term Pod). Even if user does host one's WebID Profile in one instance of Solid Storage, they can have any number of storage instances so we still need to create Solid Storage instances which don't host a WebID Profile.

NSeydoux commented 4 years ago

Absolutely, and @jaxoncreed made remarks in a common direction I think, that eventually what would be required is a generic Pod (or Solid Storage instance) provisioning interface, regardless of whether this provisioning is related to the creation of a profile document.

I also completely agree that it should not be required that your WebID is hosted on a Solid Storage instance, but from an onboarding perspective, it is likely that the vast majority of Solid users will use the default solution that is offered to them, and in my personal view this default solution is hosting the WebID on a Solid Storage instance (but I'm absolutely open to discussion about this :smiley:).

My thought in opening this specific issue, focused on the creation of a new user identity, is that it may be seen as a stepping stone towards a more generic goal that is Pod provisioning. The eventual spec contribution coming out of this discussion should be flexible enough that it may be extended towards more genericity, but my feeling is that it would be easier to come up with an initial spec proposal if we have a narrower use case in mind. Also, I anticipate that provisioning Pods would require a WebID to exist in order to set the access rights appropriately, so there is still a need for a default onboarding procedure when creating an identity from scratch.

The conversation that happened in solid/solid-spec#138 is absolutely related. Making the registration endpoint discoverable through an LDP-conforming interaction by specifying associated shapes in the spec seems like a good approach, but there are still some open questions, such as the trust that exists between the Storage Provider and the Identity Provider, allowing the Identity Provider to trigger the user registration workflow.

RubenVerborgh commented 4 years ago

So let's zoom out and look at this:

POST request with Content-type: application/x-www-form-urlencoded, the form containing username: <myUsername>, and authenticating thanks to a bearer token Authorization header. The token is a JWT with a WebId as a subject claim. This WebId identifies the IdP, and is authorized by the PS to register a new user. Currently, there is an assumption that a trust relationship exists between the IdP and the PS, materialized by a Registrar Agent (i.e. an admin user) recognized by both. The IdP can create an Authorization token for this Registrar Agent, and it is the only credential that is allowed to call the registration endpoint.

What this essentially means (and it's not been made explicit, so that should happen), is that we define a new API (as also remarked by @kjetil). So in addition to the interaction diagram, we would need to see a definition of that API.

However, as already pointed to by @elf-pavlik above, the point of Solid is exactly that we reduce the need for HTTP-level APIs, and instead have data-level APIs. E.g., we don't have /api/getUserFirstName, but we have LDP as a generic document-based API.

So I strongly oppose the creation of another API, especially if it has a separate security mechanism, which it seems to have. This is extra implementation work, a broader attack surface, and complexity we don't need.

Making the registration endpoint discoverable through an LDP-conforming interaction

So let's go further here and simply make it an LDP interface.

such as the trust that exists between the Storage Provider and the Identity Provider

To be honest, I don't understand the problem here. I'll explain my understanding below; could you please help me see what's wrong?

The IdP knows which PS it should register to

Why would that be the case?

The user can specify one. Then no trust relationship is needed, because it's the user trusting the PS.

And if the user doesn't give a PS, then the IdP has the PS. Then it's indeed the IdP trusting the PS, but what is the problem with that?

and the IRI of the registration endpoint relative to the PS IRI (e.g. /register) .

No, the PS is simply that URL.

The IdP is known by the PS

Why would that be needed? It would just be yet another Solid interaction, if we run it over LDP.

NSeydoux commented 4 years ago

I absolutely agree that we should avoid as much as possible specific APIs, and that a controlled LDP interaction would be a good match here.

Regarding the specific points you raise:

The IdP knows which PS it should register to

Why would that be the case?

This is not a desired feature, but simply an acknowledgement of an assumption that is currently made: since the reference Solid implementation is NSS, and that NSS is both an Identity Provider and a Storage Provider, this question of the target Storage Provider is not part of the registration process: when creating an account on an NSS instance, the IdP assumes that you will be using the attached SP. I 100% agree that the user should be able to select the desired SP when registering an identity, even if a default SP should be suggested (as I assume many users would prefer being offered a sensible default). In this case, the IdP trusts a PS, which is absolutely fine, as long as the user is allowed to specify the PS they want.

The IdP is known by the PS

Why would that be needed? It would just be yet another Solid interaction, if we run it over LDP.

My concern in this case is the following: let's say you run your own IdP, and want to register an account to a given SP. Then your IdP not being known by the target SP, it might not have the required credentials to create the desired containers (i.e. its WebID has no write access). This might lead to only some well-known IdPs being supported by popular SP, which would defeat the point of the Solid ecosystem. That is why I suggested that the SP implemented a discoverable registration page, which does not involve giving any privileged access to the IdP. I absolutely agree that this adds implementation work and complexity, so I would also prefer a document-based LDP API where the Solid spec would just add requirements on the document shapes. In this case,

  1. What interactions would be required for the IdP to be granted appropriate rights ?
  2. Could you describe a hypothetical interaction that would end up with a user having access to some storage hosted by a SP (i.e. an account creation) ?
kjetilk commented 4 years ago

So, first of all, there are now 10 User Lifecycle user stories, and I would encourage everyone to submit more so that we can see whom we are designing for.

I don't want to hijack this thread, and therefore I think we should start a panel, but just to start drafting my blank sheet ideas, here goes:

First, we have gradually freed ourselves from LDP thinking, as we found LDP severely limiting in many cases, we ended up discussing LDP orthodoxy instead of what we wanted to do for our users. Now, the goal is that Solid can be easily done on an LDP server, but it should be easier to write a Solid server from scratch than an LDP server from scratch, due to the aforementioned orthodoxy.

So, indeed, we shouldn't do custom HTTP APIs, but we should take care to not frame it within LDP either. For example, adding a user is clearly an append operation, but LDP doesn't mention append, and it did a really bad job at enabling append operations.

This is the graph of thought I follow when I think about this problem:

Some random notes:

We are talking quite loosely around the different actors involved here, so I am a bit confused about the terms. When we talk about Storage Provider, we are not talking about each individual Pod Server, right? When I think about the Pod Server, then it is characterized by the user's authority being a component of the URI, whereas the Storage Provider has a different authority. In this case, the Storage Provider can initialize a Pod Server and then pass the control over the authority to the user. Are we aligned on this?

Then, I'd like to step back to the Pod Server:

So far, we have initialized a Pod from the server side, with Databrowser, /private/, /public/, a WebID, etc. This what I refer to as a big side-effect, by manipulating a certain resource with one authority (i.e. the Storage Provider), suddenly stuff emerges on a different authority. Since this ought to be a very rare (and weird) special case, it is arguably not such a big atrocity to have a custom HTTP API for doing that.

However, we might also take a completely different view on it: What if we try to minimize the side-effects as much as possible, how could that look? Clearly, a client side "registration app", could append a user, thus creating the authority, and then write all those things that are needed to initialize the Pod, like Databrowser and WebID, and a root ACL.

So, the root ACL is really what is needed, everything else can follow from that.

We have recognized for a long time that there are various situations that can lead to Loss of Control (#67). One of them is corruption or disappearance of the root ACL. In NSS, the absence of a root ACL is the only situation that results in a 500 by design. We have also recognized that this is not a situation we can live with.

There is rough consensus that we want fail to locked as the default behavior when the root ACL fails for some reason, which necessitates the creation of a recovery mechanism. This has been up earlier too.

To have a mechanism to recover from a situation where no root ACL is present doesn't seem to be widely different from a mechanism to initialize the Pod with a root ACL from the beginning.

So, that's the way I think I would prefer it to go: We start with the mechanism to recover and initialize a Pod with a root ACL. Once the root ACL is there, a registration app can put the rest there. This would minimize the side effects to be only set up the authority and use the recovery mechanism to set a root ACL. Once root ACL is set up, authority is passed to the user.

Then, the registration app needs to start the cascade that ends up in setting up the authority and setting root ACL. I can see some complexity there, as in some cases, the user may already have an identity (as I wrote in a user story), in other cases, the identity also has to be created, but the IdP may only be able to suggest the identifier, as the profile document is under the authority of the Pod Server. More user stories around this would be welcome.

I'm fairly confident that this can be done with passing data that the Storage Provider uses.

RubenVerborgh commented 4 years ago

My concern in this case is the following: let's say you run your own IdP, and want to register an account to a given SP. Then your IdP not being known by the target SP, it might not have the required credentials to create the desired containers (i.e. its WebID has no write access).

On the one hand, I'd say that it's the user making that request, so the SP has to trust the user.

On the other hand, however, trusting the user means trusting the IdP. So in that sense, it comes down to the generic problem of apps trusting IdPs. I.e., in all ways, storage creation is just an app.

elf-pavlik commented 4 years ago

My concern in this case is the following: let's say you run your own IdP, and want to register an account to a given SP. Then your IdP not being known by the target SP, it might not have the required credentials to create the desired containers (i.e. its WebID has no write access).

I find all that SP / PS confusing so myself I'll stick to term storage. Creating new instance of storage (linked with space:storage) will require webid based authentication, most likely WebID-OIDC, in that case server providing storage will know user's WebID.

To be honest I don't know if we need to define LDP based flow for creating new storage instances. In many cases storage provider may require accepting Terms of Service and for that use an interactive flow where they control UI. For most people creation of new storage instances will happen on rare occasions so I don't think they would expect to have opportunity to choose storage registration app of their choice. Eventually we may want to have standard way of creating new instances of solid storage but I would see it as low priority comparing to all the other things we should address.

kjetilk commented 4 years ago

I created a Panel Proposal: https://github.com/solid/process/pull/196 I added myself and @NSeydoux , please have a look and consider adding yourself.

I can see us spinning out in many directions already, I think we would be better served by a panel, and that we open different tickets for the different directions.

kjetilk commented 4 years ago

The User Lifecycle Panel is now up and running: https://github.com/solid/user-lifecycle-panel

I don't think I can be the driving force here, but please do add yourself so we can get this running!