Service Endpoints in the DID Doc might be an anti-pattern

w3c / did-core

W3C Decentralized Identifier Specification v1.0

https://www.w3.org/TR/did-core/

Other

395 stars 93 forks source link

Service Endpoints in the DID Doc might be an anti-pattern #382

Closed msporny closed 3 years ago

msporny commented 3 years ago

TL;DR: We don't need service endpoints in the DID Document... it's an overly-complicated anti-pattern that has a lot of downsides when we already have patterns that are implemented today that would work for all use cases.

It has been asserted that Service Endpoints in the DID Document might be an anti-pattern because, at worst, they can be used to express PII in DID Documents, and in every use case that we know of to date, they can be discovered through other means that are already employed today.

Ultimately, the problem is that developers need to be educated about the dangers of placing PII in service endpoints... many won't read the spec in detail... we have over 70 DID Methods now and the number is only increasing.

What are the chances that a non-trivial subset of them implement unwisely? My guess is the chances are pretty high, and that weakens the ecosystem.

We do have an option to not give developers foot guns... and we should try very hard not to do that. I'm afraid that non-normative documentation is better than nothing, but not good enough.

Here's what the group resolved yesterday (pending 7 days for objections to the resolutions):

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

I wish we would do more than that... there are alternatives that the group should consider in order to discover service endpoints:

Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.
Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

Both of those solutions allow us to 1) Use what we already have today, and 2) address all of the use cases that we know of.

OR13 commented 3 years ago

Add a serviceEndpoint just in time, without updating the verifiable data registry using signed-ietf-json-patch.

However ^ this solution still requires us to define a data model.... and I would argue that so does "getting service endpoints" in credentials... unless you want every vendor to construct them differently, which will harm interoperability.

in other words... there is no solution to this problem that does not include a data model... but there are proposals for how that data model should be communicated, which have privacy, security and usability tradeoffs :)

mwklein commented 3 years ago

Only using DID Documents on-ledger for well-known public identities, and using private off-ledger peer-wise DIDs for all personal identifiers mitigates the described issue as well. Personal service end-points would be shared only via the peer-wise connection, and public service endpoints are by definition meant to be public.

csuwildcat commented 3 years ago

...there are alternatives that the group should consider in order to discover service endpoints:

Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.

Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document. Anyone who disagrees implicitly (whether they are aware or not) takes one of the two positions below, there simply is no third:

All services should require centralized parties for location/distribution.
Entities should not be able to share their intended-public data with others without participating in an explicit, out-of-band, DID Doc-external activity.

If you fall under Position 2 above, please do the following to ensure you are abiding by your own beliefs, if you have not already:

Delete your Twitter account
Delete your public blog domain
Delete your resume wherever you post it
Delete your images and videos from other social media sites
Turn all forms of openly accessible sharing and connections off in all interaction-based apps, such that no one can read your posts, messages, or communications without somehow contacting you and exchanging permissions through another channel.

If you do the above things in response to the implicit Position 2 that many seem to be taking, that is a first step in building credibility for the case that we should deprive people, companies, IoT devices and other entities from a more direct, decentralized mechanism of expressing themselves in fulfillment of application and service use cases.

dhh1128 commented 3 years ago

we already have patterns that are implemented today that would work for all use cases...that we know of

I think a comparison will help explain why this argument falls flat for me.

The reason we need DIDs isn't because use cases aren't addressable, exactly -- it's because the nature of a use case's guarantees and semantics changes if we don't root them in DIDs. We could do VCs with SSH keys instead of DIDs, but we don't, because SSH keys don't have the same properties (decentralization, discovery, rotation, potential for multisig...) that DIDs do.

Similarly, the nature of a service endpoint's guarantees and semantics changes if we don't put them in DID docs. This is the essence of @csuwildcat 's comment above, which I agree with -- sure, you can do discovery with existing mechanisms, but you can't do it in a decentralized way unless you either use DID docs or invent an entirely new mechanism with the same characteristics as DID docs. Yes, there are alternative ways to communicate an endpoint. The DID controller may or may not control those alternative mechanisms. Therefore, by removing the service endpoint from the DID doc, we are allowing someone other than the DID controller to frame any conversations associated with that DID. You could say, "No big deal; the non-DID-controller can't lie about controlling the DID when a digital signature or encryption is required." I answer: "True, but that's not the full requirement, because just controlling the endpoint value itself allows a malicious party to simulate the silence, uncooperativeness, or flakiness of a DID owner they want to harass."

The recent Twitter hack of accounts belonging to Obama, Biden, Elon Musk, and others is exactly the sort of thing we enable if we communicate service endpoints outside the DID doc. That was an existing communication mechanism that could communicate endpoints, and its security properties are different from a DID doc itself. The claim that leaving service endpoints in the spec is an invitation for disaster is only half a story. Yes, doing service endpoints right is hard, and doing it wrong could be obnoxious. But taking it out is just as problematic, and I don't think developers write code that guards against ordinary cybersecurity risks any better than they write code that guards against service endpoint abuse. The difference is that service endpoints is a new field of knowledge where developers will be open to guidance, rather than familiar territory where developers will casually assume they already know best practice.

wyc commented 3 years ago

I agree that service endpoints certainly will reduce privacy in their (mis)use, and this is an important consideration to make.

However, I think that if we excluded them from the DID spec then the new risk we incur is one related to standards adoption--the standards will become far less useful without an ascribed way to do service discovery. +1 to @dhh1128's points about "what makes DIDs different and more useful than SSH keys?", with this being a core reason. Consider the impact of this on DIDComm, which in my mind is a major use case for DIDs. I believe we will need service endpoints to enable the discovery portion of DIDComm, though I'm not certain. cc @telegramsam @awoie

Also agreed to the point that if we punt service endpoints into another standard, then the problem still doesn't go away. In fact it might be solved in a lot less decentralized way than with DID documents, such as state-owned BigCo saying that they are the #1 DID Broker that's easiest to use for everyone because they can direct a slush fund towards winning the market in this way--and everyone would likely use the most convenient and free thing around, as we've seen for the past 10 years on the Internet.

So in summary, I recommend we keep service endpoints while acknowledging they will bring privacy problems, with the understanding that having their functionality provided somewhere else could cause (1) significant adoption risks and (2) even larger systemic privacy risks. Perhaps if we agree on these logic inputs but disagree on the specific risk measures, we can make them part of the calculus from which the decision is made.

Finally, wanted to mention that resolution of this would unblock our work with the W3C privacy self-assessment here: https://github.com/w3c/did-core/issues/291#issuecomment-681172870

agropper commented 3 years ago

I propose a compromise solution based on my privacy-inspired perspective in https://github.com/w3c/did-core/issues/370#issuecomment-683075977

Relative to yesterday's pending resolutions:

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

Treat the PDP serviceEndpoint as normative, if present.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

Define an abstract data model for the PDP serviceEndpoint based on standard UMA2 and pending GNAP practices.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

Yes.

peacekeeper commented 3 years ago

there are alternatives that the group should consider in order to discover service endpoints

I have some sympathies for this view; it seems to align with what Sam Smith has been trying to tell us since the Amsterdam F2F, which is that DID documents should only be about establishing control authority over the identifier, and that everything else (including service endpoints) should happen on a different layer.

But as others have pointed out in this thread, I also believe that alternatives (such as sending service endpoints together with the DID via the original channel, or using a search engine, or using a special refresh/notification/etc. service) will usually not provide the same guarantees that DID resolution and DID methods are supposed to provide, i.e. decentralization, control, cryptographic verifiability.

DIDs should enable service and data portability in the same way as they enable key rotation. Services are not comparable to VCs, they are much more foundational. DIDs are an indirection layer on top of both verification methods and services, since those are the fundamental constructs that enable trustable interaction associated with the subject.

agropper commented 3 years ago

Building on @Markus' "fundamental constructs that enable trustable interaction associated with the subject", consider private keys and authorization policies as the two things I should never be asked to put on the wire. Relative to my private keys, all anyone ever sees is a useful derivation. Relative to my policies, all anyone ever sees are a capability derived from my policies. These are the two key functions of the "indirection layer" enabled by DIDs. This is why I suggest that a PDP be the first, maybe the only, normative data model associated with a DID.

On Fri, Aug 28, 2020 at 4:01 PM Markus Sabadello notifications@github.com wrote:

there are alternatives that the group should consider in order to discover service endpoints

I have some sympathies for this view; it seems to align with what Sam Smith has been trying to tell us since the Amsterdam F2F, which is that DID documents should only be about establishing control authority over the identifier, and that everything else (including service endpoints) should happen on a different layer.

But as others have pointed out in this thread, I also believe that alternatives (such as sending service endpoints together with the DID via the original channel, or using a search engine, or using a special refresh/notification/etc. service) will usually not provide the same guarantees that DID resolution and DID methods are supposed to provide, i.e. decentralization, control, cryptographic verifiability.

DIDs should enable service and data portability in the same way as they enable key rotation. Services are not comparable to VCs, they are much more foundational. DIDs are an indirection layer on top of both verification methods and services, since those are the fundamental constructs that enable trustable interaction associated with the subject.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/did-core/issues/382#issuecomment-683122421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YN7KYB54ELXVNR3F23SDAEIBANCNFSM4QOKAB3A .

jonnycrunch commented 3 years ago

I have a procedural objection to this approach. The proposals that we agreed to were an attempt to communicate consensus among the participates in a special topic call and as such are non-binding. As Ivan @iherman pointed out in the minutes these "resolutions" would be brought back to the rest of the group for broader discussion. Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

msporny commented 3 years ago

Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

The 7 day window is for the RESOLUTIONs we made, not for the topic at hand. This 7 day window is the process the group agreed to for the special topic calls. It provides an opportunity for people to object on the main topic call while ensuring that there is closure to resolutions so the group can build upon them.

/cc @brentzundel @burnburn -- we may want to remind the group of this process during the next call.

@jonnycrunch -- are you objecting to any of the RESOLUTIONS made during the last call? I note that you didn't object at the time: https://www.w3.org/2019/did-wg/Meetings/Minutes/2020-08-27-did-topic#res

dlongley commented 3 years ago

@csuwildcat,

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging more places to express service endpoints, not fewer. Note that that's a decentralized mechanism for solving this problem, not a centralized one.

@dhh1128 -- Can you provide a link to how the DIDComm community is considering how "GDPR-compliant service endpoints" might be implemented and how a VDR might differentiate them from non-compliant ones?

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

csuwildcat commented 3 years ago

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

A ledger is not the place where adjudication of purported types is resolved, that is always going to be in a less resource constrained system that has more latitude to evaluate assertions based on evidence that can be computed ad hoc. The ledger is the place for key awareness, routing, and type declaration - on the latter point, it's about efficient global sorting in the aggregate sense, not assertion validity evaluation, which is not a singular, universal, globally shared test anyway.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging more places to express service endpoints, not fewer. Note that that's a decentralized mechanism for solving this problem, not a centralized one.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

dhh1128 commented 3 years ago

@dlongley : providing a link is a bit challenging, because knowledge about the question exactly as you framed it is scattered through numerous documents. The best single doc I can offer is here. This covers about 70% of your question. I will attempt a summary here that is partly redundant with that doc, and that fills in some gaps.

First, it's important to understand that, because DIDComm is not API-centric, it doesn't need a different endpoint for every service or protocol it exposes. The DIDComm community is assuming that a party usually needs only one DIDComm endpoint (per transport) no matter how many services they intend to offer. (The "per transport" note is just to acknowledge that if you want to speak DIDComm over http, smtp, AMQP, BlueTooth, and sneakernet, those may be different endpoints -- but you don't need different ones for credential issuance, verification, and so forth. Those are all just protocols running over a single endpoint.)

Now, a DIDComm endpoint has baked into it the potential (but not the requirement) for routing. Routing is done by a mostly untrusted mediator that has its own encryption keys. If Alice is talking to Bob, and Bob is using a mediator, then Bob's service endpoint will be hosted by the mediator. Thousands or millions of other parties can (should) have exactly the same service endpoint. The URI for the endpoint has no query string and nothing in its domain name that identifies Bob in any way. Alice places her plaintext message (let's call this M[0]) inside an encryption envelope that only Bob can open. Let's call the encrypted result M[1]. Then Alice places M[1] inside an encryption envelope that only the mediator can open. Let's call that encrypted result M[2]. The encrypted header of M[2] tells the mediator what Bob's DID is. Bob and the mediator have previously arranged for the mediator to forward messages for Bob's DID to Bob. (There's a DIDComm protocol they can use, if they want -- or they can do it any proprietary way they like, since it doesn't have to be interoperable.)

When the mediator receives the message M[2], it opens the outer encryption envelope and peers inside. It sees that the encrypted inner message is intended for Bob's DID. It then forwards M[1] to Bob. How it does this is never publicly known; it is a private arrangement between Bob and the mediator.

In order for Alice to know that she must do the double wrapping required by Bob's mediator, the service endpoint for Bob needs to contain an ordered list of the keys (or DIDs that let her look up keys) that she has to use when encrypting for Bob's route. Thus we have a serviceEndpoint declaration with a routingKeys field that might contain: [<DID or key of Bob's mediator>]. A route that uses one mediator will have one entry in this array; a route that uses two mediators will have two, etc. (Why you'd want two mediators is beyond scope here; suffice it to say that either one or two might be common, but anything more than two will not be.)

Now, note the properties I've just described:

There is no identifier for a recipient embedded in the service endpoint, and it is not transmitted as plaintext anywhere (in HTTP headers, in a POST body...) either. No eavesdroppers can learn anything.
The serviceEndpoint section of Bob's DID doc #1 would be identical to that section in Bob's DID doc #2...N, and to the endpoints of all customers of the same mediator.
The mediator knows that they have a message to give to Bob's DID, but they don't necessarily know who it's from, and they don't know anything about the message except the size of the encrypted BLOB. The mediator does not know the content of Bob's DID doc. Bob's DID doc can be pairwise; it doesn't have to be on a ledger.
There are two abuses that a mediator could perpetrate: they could record all the times and the sizes or encrypted content of all inbound messages for Bob, and they could fail to forward messages (selective or total delete).

Given this, we believe the requirements for GDPR compliance of the endpoint are:

If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant). This is not the case I described above, but I mention it just for completeness. We know this condition obtains when the endpoint has no routing keys.
If the endpoint is mediated (which we can detect because there are one or more routing keys for the endpoint), then the mediator becomes a data processor, and their duty is to A) faithfully deliver messages; and B) delete all data and metadata about messages after they are delivered. In cases where duty B is nuanced in some way, this should be clearly specified in the terms and conditions that were worked out when Bob and his mediator negotiated services. (The DIDComm protocol that does this has a place for that.)

Now, you asked how the outside world can know that Bob's endpoint is GDPR-compliant. I would like to point out that this is far less interesting than how BOB knows that his service is GDPR-compliant; in fact, I'm not even sure the outside world's question is legitimate. We send one another emails all day long without knowing whether the email service used by the recipient is GDPR compliant. It's none of our business; all we need to know is that the person we're attempting to contact has asked us to hand off the data to a particular mail transfer agent, and is apparently satisfied that that MTA will do the right thing.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

dhh1128 commented 3 years ago

I would like to point out a fundamental misalignment that permeates this thread. @msporny is approaching service endpoints from the standpoint that the goal of putting them into a DID document is to communicate a place to talk. I don't agree that this is an accurate summary of the goal. I would say that the goal of putting them into a DID document is to communicate a place to talk such that the communication is known to emanate from the DID controller, and such that the key material in the DID doc is known to apply to the associated endpoint in crisp, indivisible version evolution. That is, I want to be able to say that DID doc version X bundles a key state + an endpoint state, and version Y bundles a different key+endpoint state; I don't want them to be able to evolve independently. The "such thats" are very important to me, and I haven't yet seen any proposal that accomplishes these goals other than one of putting the endpoint in the DID doc. Manu has suggested that we need to explore alternatives. I'm totally fine with that -- but I'm only interested in alternatives that include my "such thats." Everything else is abandoning a vital security and control requirement of the system, IMO.

agropper commented 3 years ago

@dhh1128, is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above https://github.com/w3c/did-core/issues/382#issuecomment-683085862 except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

As for @csuwildcat Use Case:

decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

I'm confused by the inclusion of both "intended-public info" in the same use case as "or ad hoc encrypted direct sends...". Can we deal with these separately?

The intendedPublic serviceEndpoint type does not benefit from access control but may benefit from checks on authenticity. We should be able to craft a normative data model for this optional serviceEndpoint.

The "ad hoc encrypted" serviceEndpoint type will require something like the "PDP" serviceEndpoint type where the other "entity" can provide some claims, endpoints, and encryption keys.

dlongley commented 3 years ago

@csuwildcat,

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

Which one? Which one(s) will the VDR permit? Will there be a centralized allow list for the ones that are permitted? How will the URI handle herd privacy? After all of these questions are answered, could it be that you should have just asked that other decentralized network directly for a VC signed by one of the DID's keys?

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

Then you don't understand the core problem I'm trying to highlight. The VDR/DID method gets to decide what will be accepted in a DID Document. This is related to the GDPR/privacy problem of what kind of information is allowed onto an immutable ledger.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

I also want to make sure we have a healthy ecosystem that can leverage service endpoints. All of these issues are interrelated.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

Please describe a single user story that is specific and concrete for people in this thread to talk about. I think the above is too abstract to help move the needle.

dlongley commented 3 years ago

@dhh1128,

Thank you for your response, there's a lot of good information in it. I'm going to try and focus down to the specific problem with expressing information on an immutable VDR.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

I think my question was unclear because it was interpreted to be talking about whether or not the service behind the endpoint itself was GDPR-compliant. Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it. And I'm not talking about incidental PII or information that is intentionally encoded in some abusive way to circumvent the feature of expressing non-PII information in a DID Document.

As an example, how does a VDR distinguish this:

https://danielhardman.com/my-personal-handle

From something like this:

https://public-company.com/foo

From something like this:

ipfs://fl3hf4kjh4fk3f/fhjl2fjlk23f32f/23423

The first URL having implicitly human-meaningful identifiers baked into for a private party, the second having implicitly human-meaningful identifiers baked into it for a public party, and the last having no human-meaningful identifiers baked directly into the URL.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant).

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

Another side question:

The serviceEndpoint section of Bob's DID doc #1 would be identical to that section in Bob's DID doc #2...N, and to the endpoints of all customers of the same mediator.

How many of these mediators do you expect to exist in the ecosystem?

dhh1128 commented 3 years ago

Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it.

Ah. Yes, you're right; I misinterpreted the question.

I know of no way to inspect a raw URL and conclude with certainty that it does or doesn't contain PII.

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

Yes, that's a reasonable example. It could also be https://myisp.com/didcomm or https://myuniversity.edu/students or whatever. (HTTPS is not strictly required for security properties, but there are some benefits to it, such as the fact that mobile apps will pass review by app stores if they only make HTTPS calls.)

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

No. Each VDR has to solve this problem. The first Indy/Aries solution to this problem is to use peer DIDs, which are never written to a ledger in the first place, and to use ZKPs for VCs, which don't require a binding to a public DID. Building on that, Sovrin's next solution to this problem is to decompose full DID docs into individual sections that have more specialized data models and their own transaction types. This makes them amenable to careful validation. That may filter some obvious stuff (query strings with DID values in them), but it will not fix the deeper problem in your 3-part example. Its next proximate solution to this problem is to require each writer to the ledger to include with their write a signature over a Transaction Author Agreement (essentially terms and conditions that clarify that no PII is allowed, and that by writing the data, any claim of right to be forgotten are explicitly forfeit). That probably will limit problems significantly, but it may not be enough in the end. Sovrin's final solution is to support a tombstoning mechanism that can be applied on a per-node, per-jurisdiction basis, such that read requests of a tombstoned record cause the semantic equivalent of an HTTP 451 error, yet the ledger's integrity, and the ability to forge consensus by nodes in different jurisdictions, is maintained.

Note that I deliberately said "Sovrin" in the preceding paragraph. Other Indy ledgers may choose to layer their own solutions on top of the peer DID strategy (or a different DID strategy), according to the governance they choose. Non-Indy ledgers each have to solve it, also. I'm not aware of a good solution yet for Bitcoin and Ethereum.

How many of these mediators do you expect to exist in the ecosystem?

The answer here will vary by time. Aries includes an Apache-2-licensed impl of one, and there are currently several SaaS vendors in production, who've got an interoperable wallet scheme to prevent vendor lockin... In the youth of the ecosystem, dozens or hundreds? Eventually, I'd say they will be offered by a meaningful % of ISPs or email providers and will have a long-tail distribution of customer counts like mail transfer agents -- so maybe tens of thousands, with a small handful supporting herd sizes in the billions or millions?

dhh1128 commented 3 years ago

@agropper :

is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above #382 (comment) except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

I'm not sure. I don't think DIDComm is a "notification" service endpoint type; I think it's a service endpoint type all its own. It can be used for anything that DIDComm can be used for, which is any message-based interaction (protocol) that wants to inherit DIDComm's security and privacy guarantees and processing model. I also don't know enough about UMA2 and GNAP to feel confident about the analog.

dlongley commented 3 years ago

@dhh1128,

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

dlongley commented 3 years ago

@csuwildcat -- Please take a look at @dhh1128's comment. He covers Sovrin's view of putting PII onto a VDR and all of the problems there. This is the sort of thing I've been trying to highlight as a problem for the case you want supported.

dhh1128 commented 3 years ago

@dlongley :

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. (Or perhaps more precisely, experts I've talked to say that they believe legal rulings will eventually formalize this legal conclusion.) The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

dlongley commented 3 years ago

@csuwildcat,

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

dhh1128 commented 3 years ago

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

True. Well, sort of. I want service endpoints in the spec because A) I want institutions to publish their endpoints in their DID docs; and B) I want private individuals to put their endpoints in peer DID docs.

Daniel B's case of individuals publishing an endpoint for a public DID on a public ledger for discovery purposes is one I've thought less about. I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, [[edit: and in lots of other places]]). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

dlongley commented 3 years ago

@dhh1128,

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

I understand that this is your position. It's not settled yet -- and until it is, there are possible interpretations that split the information into two separate classes. There are also a number of exceptions to the "right to be forgotten" for which this difference might be important either on its own or in conjunction with the function of or governance/authority structures for a particular VDR. So, there remain open questions. It's harder to make the case for any difference, however, when your full legal name is explicitly called out in a DID Doc as merely additional information. This is in contrast to other "authoritative data" in the DID Doc including the DID itself and public key material that can be more readily linked to legal purposes and public interest, etc.

dlongley commented 3 years ago

@dhh1128,

I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

Yes, but this approach is what @csuwildcat is railing against as being insufficient for his use case (which we still need to get more concrete about).

csuwildcat commented 3 years ago

Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc).

Guys, please reread this and consider how it is explicitly failing to solve for the needs I have. To help, I will restate the comment above in the scope of the use case: "Dan, you can create decentralized social networks, decentralized secondhand sales networks, decentralized gig economy exchanges, etc. that don't require centralized intermediary services, like Twitter, Craigslist, and Uber, by creating unregistered, uncrawlable DIDs, and simply attaching them to your Twitter, LinkedIn, and Uber accounts"

agropper commented 3 years ago

GDPR aside, putting a DID that refers to a person on a public registry is problematic for the same reason putting their Social Security Number or their facial biometric on a public registry is problematic. In all three cases, "rotating" the identifier is difficult, or in the case of biometrics, impossible. That means that we need to consider how DIDs, SSN, and biometrics are used rather than just worrying about the right to be forgotten.

DID documents are meant to be updated with rotations but the whole point is that the DID itself is forever whether it's on a public registry or held among peers. Otherwise, did:key works for cases where updates are not needed.

Adrian

On Sun, Aug 30, 2020 at 12:49 PM Daniel Buchner notifications@github.com wrote:

To me: "The way you eliminate centralized platforms as the bottleneck for censorship, interdiction, and gatekeepery is by making it so people still have to crawl, index, and triangulate intended-public DIDs and intended-public data through those same centralized bottlenecks of censorship, interdiction, and gatekeepery"

Me:

[image: external-content duckduckgo] https://user-images.githubusercontent.com/131786/91664717-deff5400-eaa5-11ea-8620-6ffd37e8bea9.gif

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/did-core/issues/382#issuecomment-683443275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YL6BJQ5FPBPGQEDQB3SDJ7KJANCNFSM4QOKAB3A .

dhh1128 commented 3 years ago

@csuwildcat : I don't think your sarcastic memes are appropriate for this community.

My comment that a DID could be published on Twitter or Facebook is quite different from saying that DIDs must be published on those platforms. It goes without saying that what can be published on social media can also be published in any other convenient way: by putting it on slides at a conference, by attaching it to your github profile, by putting it on your physical or digital business card, by listing it next to your name in your professional publications, etc. My point was that there are plenty of ways of publishing a DID besides ledgers; I'm sorry that my examples threw you for a loop.

Now, you clearly believe that the specific other ways I cited (Twitter, FB) are undesirable because they're centralized. But supposedly decentralized ledgers do not have magical pixie dust that makes everything they touch decentralized, and supposedly centralized, proprietary systems do not have magical demon dust that makes everything they touch centralized. A single global ledger where all discovery is conducted is centralized. Its permissioning and method for accepting commits may be decentralized, but it is not decentralized in the patterns of its reads. Likewise, a system that includes optional recording of DIDs in centralized platforms is not centralized if it also records the data in many other places. I was not proposing that we give an exclusive franchise to Twitter and Facebook to record public DIDs; that's the only distortion of what I said that deserves the scorn you projected.

csuwildcat commented 3 years ago

Without a decentralized substrate for globally iterable ID/routing info that can be assembled without third-party reliance, you cannot deterministically locate all entities who wish to be included in exchanges without requiring specialized setups, third-party interdiction points, or ad hoc out-of-band coordination. I have no issue with entities who want to rely on centralized parties or resign themselves to sharing their information through indirect mechanisms, I'm simply opposed to anything that would force more barriers, centralized intermediaries, or friction on entities who desire to participate in an open broadcast substrate that is as decentralized as possible.

agropper commented 3 years ago

@csuwildcat I may have missed it, but could you respond to this question about your use-case? https://github.com/w3c/did-core/issues/382#issuecomment-683330079

dhh1128 commented 3 years ago

@csuwildcat :

Without a decentralized substrate...

I largely agree with this statement, but:

I disagree that perfect enumeration is a requirement from the people who want to use public DIDs. They are fine with ad hoc coordination, because it's easy to get their data visible to the parties they want to connect with (just as it's easy to publicize your email address if you want to). The problem arises when external data consumers want to index/crawl/access all of this chaotic data universe of personal data as if it were a coherent corpus that they can mine. That is NOT a requirement of the person who wants a public DID; it's a requirement of those who want to consume those DIDs. And I don't buy it.
Even if we disagree about #1, and accept your proposed requirement that there must be a single place where global discovery can occur, I disagree with your assumption that this means we need a ledger. I shared a way to do privacy-preserving, decentralized discovery/enumeration without a ledger in a previous CCG discussion. It doesn't have the GDPR problem, it is more private than a public ledger, and whatever data you choose to publish is discoverable as long as you want it to be, and not one instant longer. See this doc.

csuwildcat commented 3 years ago

@csuwildcat I may have missed it, but could you respond to this question about your use-case? #382 (comment)

I honestly feel like we need a special topic call just to go over the decentralized app concepts in general, because I feel like this recurring question is a symptom folks who may not yet see past the 1% of identity that is credentials. But if we can't do that, here is the 'use case' (aka: entire world of use cases it represents): We need a system capable of decentralizing the vast majority of apps you have on your phone today. Part of doing that is having an open, decentralized, direct, uninterdictable crawl substrate, whereby any developer can write app code that iterates the substrate and asks the DID entities on it, whatever they are, for certain types of data they may want to share. You can do this by iterating the global DID registry, finding personal datastore endpoints, and sending a request for whatever data you would like from the entity that owns the DID, for example:

Want to have a vibrant, open substrate for ingesting a firehose of all the social posts everyone on the planet intentionally wants to make public for anyone to see? Easy peasy lemon squeezy! --> Crawl all DIDs on the decentralized DID substrate and ask for any SocialMediaPosting objects they would like to share. Boom, no longer need to go through social media company silos to access the world's social media feeds!
Want to find all the code packages you could possibly use for your next Node.js project without having to go through a centralized registry? No problem, we gotchu fam! --> Just iterate the DID substrate for all the IDs typed as software projects, and contact their personal datastore to find all their signed SoftwareSourceCode packages. Can you smell the sweet, sweet code package liberation? I can.
What's that? Find all the public resumes and other career-related posts in the world? Get a load of this --> Crawl that DID substrate and ask all the personal datastores of all the DIDs if they have any of career data they'd like to share. No more silo for career data, booyah!
Want to get the product catalogs for all the companies on the planet? No sweat, you can do it with your eyes closed --> just contact all the personal datastores of the IDs typed as companies of some kind and ask them for any public GS1 Product objects they'd like to share. It's time to make Google's centralized product index look like a cute little toy.

Basically, for any app type that features a need for an open substrate of intended-public data that participants want anyone to be able to find, the same exact same recipe holds. With this we can radically, fundamentally change the entire app and open data ecosystem, putting control back in the hands of individuals, while empowering developers by eliminating many of the barriers that are present in today's world of walled content gardens and information network silos.

agropper commented 3 years ago

Thanks. So, based on the example you give, what is the nature, if any, of access control to this public information?

In the same vein, what prevents centralized actors from collecting all of the public information they can get and then adding even more information, leaked under the common exemptions for supposedly de-identified personal data that we find in HIPAA, CCPA and pretty much every other regulation. In healthcare, this kind of involuntary but legal surveillance even has a name: “referential matching”.

A further problem is that many so-called privacy laws explicitly avoid regulating “public information” as a restriction of 1st amendment rights.

Adrian

On Sun, Aug 30, 2020 at 11:28 PM Daniel Buchner notifications@github.com wrote:

@csuwildcat https://github.com/csuwildcat I may have missed it, but could you respond to this question about your use-case? #382 (comment) https://github.com/w3c/did-core/issues/382#issuecomment-683330079

I honestly feel like we need a special topic call just to go over the decentralized app concepts in general, because I feel like this recurring question is a symptom folks who may not yet see past the 1% of identity that is credentials. But if we can't do that, here is the 'use case' (aka: entire world of use cases it represents): We need a system capable of decentralizing the vast majority of apps you have on your phone today. Part of doing that is having an open, decentralized, direct, uninterdictable crawl substrate, whereby any developer can write app code that iterates the substrate and asks the DID entities on it, whatever they are, for certain types of data they may want to share. You can do this by iterating the global DID registry, finding personal datastore endpoints, and sending a request for whatever data you would like from the entity that owns the DID, for example:

Want to have a vibrant, open substrate for ingesting a firehose of all the social posts everyone on the planet intentionally wants to make public for anyone to see? Easy peasy lemon squeezy! --> Crawl all DIDs on the decentralized DID substrate and ask for any SocialMediaPosting objects they would like to share. Boom, no longer need to go through social media company silos to access the world's social media feeds!

Want to find all the code packages you could possibly use for your next Node.js project without having to go through a centralized registry? No problem, we gotchu fam! --> Just iterate the DID substrate for all the IDs typed as software projects, and contact their personal datastore to find all their signed SoftwareSourceCode packages. Can you smell the sweet, sweet code package liberation? I can.

What's that? Find all the public resumes and other career-related posts in the world? Get a load of this --> Crawl that DID substrate and ask all the personal datastores of all the DIDs if they have any of career data they'd like to share. No more silo for career data, booyah!

Want to get the product catalogs for all the companies on the planet? No sweat, you can do it with your eyes closed --> just contact all the personal datastores of the IDs typed as companies of some kind and ask them for any public GS1 Product objects they'd like to share. It's time to make Google's centralized product index look like a cute little toy.

Basically, for any app type that features a need for an open substrate of intended-public data that participants want anyone to be able to find, the same exact same recipe holds. With this we can radically, fundamentally change the entire app and open data ecosystem, putting control back in the hands of individuals, while empowering developers by eliminating many of the barriers that are present in today's world of walled content gardens and information network silos.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/did-core/issues/382#issuecomment-683530573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YOFBYP7HURUKZX7T7TSDMKGRANCNFSM4QOKAB3A .

csuwildcat commented 3 years ago

In the same vein, what prevents centralized actors from collecting all of the public information they can get and then adding even more information

Anyone can access a public dataset that the entities are intentionally putting out to the world. This question is like asking "What prevents Google, DuckDuckGo, RSS feed viewers, web browsers, etc. from collecting/rendering your openly published blog posts to anyone who happens to hit a URL?" - because that's entire the point of some data: to be broadcast openly as widely as humanly possible, and encourage anyone interested to come read it, shape it, display it, etc.

csuwildcat commented 3 years ago

The DID substrate is a decentralized, interdiction-resistant, tamper-evasive secure routing system that allows peers to connect to whatever semantic information broadcasts an entity wants them to see - it's a world engine for decentralized information networks that can power a new class of decentralized apps and services.

Few Understand This™

agropper commented 3 years ago

@csuwildcat and everyone in this discussion would do well to read https://blog.apnic.net/2020/08/31/rfc-8890-the-internet-is-for-end-users/ and consider the consequences of giving even more access to surveillance capitalists. What we need is technology that forces more transparency and control over personal data flows, not less.

csuwildcat commented 3 years ago

@agropper I continue to be baffled that people are arguing I should not publicly, openly post blog entries, tweets, resumes, etc. to anyone who may want to read them. These use cases are fundamentally ones where you want anyone to be able to openly access the content, and do so as freely as possible, without interrogation or gatekeeping, but you're suggesting this is wrong, which defeats the entire point of these use cases. I can't seem to understand what you are presenting as the alternative? We have everyone go through an authorization server before they can read my blog posts, which I don't want or need anyone to be authorized to read? Or is it that you are somehow confusing all data with this type of data I am talking about? Surely you can see where blog posts are different than my private medical data, and that I would gate one, but not the other, surely?

agropper commented 3 years ago

@csuwildcat What I'm proposing is illustrated by the quandary public website operators face with Google Analytics. As a publisher, I want to know as much as possible about who is interested and, if I could, why. A few privacy-resepecting website operators pay good money to avoid Google Analytics. Others just publish blind rather than expose the "requesting parties" to surveillance.

So, yes, as an individual, I want my self-sovereign authorization server to play the role of Google Analytics, without the centralization. That's the essence of SSI as far as I'm concerned.

csuwildcat commented 3 years ago

@agropper I was just able to access your public posts and tweets (http://healthurl.com/www/Blogs_+.html, https://twitter.com/agropper) without any authorization required. These appear to be publicly available resources you want anyone to be able to openly view, is that correct?

dhh1128 commented 3 years ago

@csuwildcat : I feel like I can see both your perspective and that of @agropper . I get that people want to publish stuff and operate publicly. But the other fundamental requirement is that people also want to be able to change their minds. You were operating publicly when you posted a cranky meme in a comment above; you later deleted that meme when I complained about it. This was possible because github provides an "edit" feature -- something that an immutable blockchain does not support, and which you are demanding we de-prioritize even as you use the feature yourself in a public forum.

As long as "operating publicly" is centralized, there's a way to enforce the possibility of "change their minds" (mainly through legal threats). But as soon as you decentralize, the path for protecting this second requirement becomes cloudy. I proposed one possible answer above, but it doesn't have the same crawlability as yours.

Now, I don't advocate a centralization strait-jacket. My suggestion that people could publish something on Twitter if they want wasn't a suggestion that we ignore other alternatives that are better. Where you and I disagree is on the relative value of a centralized crawl. I say it's not very important to individuals; I'd be content if I could create peer DIDs off ledger (the ultimate decentralization) and publish/unpublish in whatever systems I like (centralized or not). That way, I am responsible for the GDPR implications, and I can choose whatever tradeoffs I like.

It seems that what you are hoping for is a single solution that works for everybody. I don't think there is such a thing, and I think you're prioritizing the needs of corporate data consumers over the needs of individual data producers. All of the coolness you touted in your bulleted list above ("ingesting a firehose of all the social posts everyone on the planet intentionally wants to make public"... "code packages"...) is coolness for an indexing/crawling service. Individuals may benefit from such services, but unevenly and imperfectly; it's great to be able to post publicly on FB, until you want to not have your future employer look up your immature behavior in college. The one-size-fits-all-and-once-public-we-get-to-index-it-forever approach necessitated by global crawlers perceives individual customizations as friction to be ground down and eliminated.

Individuals also don't typically run such crawling services directly. I have better things to do with my time and resources than discover the world's social media posts. So who will do it? Answer = an institution that becomes a new point of centralization. A list of all nodejs packages available worldwide, built by crawling a decentralized landscape, is still a centralized list; effectively it's not much different from npmjs, and as a developer, I'd rather consume the curated version.

csuwildcat commented 3 years ago

@csuwildcat : I feel like I can see both your perspective and that of @agropper . I get that people want to publish stuff and operate publicly. But the other fundamental requirement is that people also want to be able to change their minds. You were operating publicly when you posted a cranky meme in a comment above; you later deleted that meme when I complained about it. This was possible because github provides an "edit" feature -- something that an immutable blockchain does not support, and which you are demanding we de-prioritize even as you use the feature yourself in a public forum.

Why are folks trying to tell me about how blockchains are immutable and you can't delete things, while websites have edit features to delete things from databases? Are folks somehow getting confused and thinking I am saying this data should/will be present on a blockchain or within some immutable infra layer in a DID Method? If so, I am not doing that at all, and I am struggling to see how I can make this much more clear, given I have consistently said the DID Document would only route to personal datastore endpoints. Your personal datastore will allow for the exact flow as you described with Github: if I delete any portion of data that was once exposed publicly, it is gone.

agropper commented 3 years ago

@dhh1128 nails it when proposing that individuals should control where our public information is indexed. In the secure data stores context, I advocate for indexes to be treated like any other data resource and kept separate from the documents and streaming interfaces who's metadata is being aggregated and indexed. I would treat both indexes and storage as policy enforcement points (PEP) to be told, in a self-sovereign and un-censorable way where the data subject keeps their PDP.

@csuwildcat, my website you posted displays exactly the problem. Privacy Badger and Disconnect plugins to my Firefox each display only 1 tracker. Guess what, it's Google Analytics. Given that I am still lamely using a web publishing editor that was discontinued by Apple 11 years ago, it's "too hard" for me to get rid of that last tracker. It's up to us in W3C to fix this.

csuwildcat commented 3 years ago

We absolutely must have a topic call on this, because what I thought was a relatively straightforward thing has not been understood as I thought it would. It's clear that the vast majority of folks on it are focused on the 1% of identity that is core ID/credential type data/claims - which is fine, while others are looking to tackle things outside of that. I don't particularly care what folks work on, so long as they don't hinder the work/needs of others. We need to take the time to better understand each other over a topic call because this is such an important thing to get right, else we will be left with a Web that looks a lot like it does today, which would be a tragic lost opportunity.

csuwildcat commented 3 years ago

@dhh1128 nails it when proposing that individuals should control where our public information is indexed. In the secure data stores context, I advocate for indexes to be treated like any other data resource and kept separate from the documents and streaming interfaces who's metadata is being aggregated and indexed. I would treat both indexes and storage as policy enforcement points (PEP) to be told, in a self-sovereign and un-censorable way where the data subject keeps their PDP.

@csuwildcat, my website you posted displays exactly the problem. Privacy Badger and Disconnect plugins to my Firefox each display only 1 tracker. Guess what, it's Google Analytics. Given that I am still lamely using a web publishing editor that was discontinued by Apple 11 years ago, it's "too hard" for me to get rid of that last tracker. It's up to us in W3C to fix this.

None of this makes any sense. You can run your personal datastore wherever you want, and I am not sure why you would install a tracker on your own PDS. You seem to be talking about something that does not apply to the service endpoint routing layer.

agropper commented 3 years ago

@csuwildcat It's not about my PDS. It's about my ability to decide on access to any data store, even public ones, like LinkedIn 2.0 where, along with my user authentication, I could register my PDP, so that LinkedIn MUST refer you or anyone else that wants to see my posts, would first have to visit my UMA or GNAP Authorization Server and get an authorization token. Right now, I have to depend on LinkedIn to interpret and implement my policies. This is exactly what I'm trying to decentralize, make self-sovereign, and fix.

Why wouldn't LinkedIn offer me this feature?

csuwildcat commented 3 years ago

@csuwildcat It's not about my PDS. It's about my ability to decide on access to any data store, even public ones

You encrypt data and set PDS permissions to accomplish exactly this, and you don't need to contact another server, you just get permission/encrypted access to things from the owner of the PDS themselves. I feel like I am in the Twilight Zone right now.

agropper commented 3 years ago

@csuwildcat How would this work with LinkedIn without damaging any of its current significant value propositions? Can you explain the steps that LinkedIn and I would implement?

dhh1128 commented 3 years ago

Why are folks trying to tell me about how blockchains are immutable and you can't delete things, while websites have edit features to delete things from databases? Are folks somehow getting confused and thinking I am saying this data should/will be present on a blockchain or within some immutable infra layer in a DID Method? If so, I am not doing that at all, and I am struggling to see how I can make this much more clear, given I have consistently said the DID Document would only route to personal datastore endpoints. Your personal datastore will allow for the exact flow as you described with Github: if I delete any portion of data that was once exposed publicly, it is gone.

The URI of your personal datastore, if PII (e.g., https://mydatastore.com/csuwildcat), cannot be erased from the immutable ledger. What it exposes can be changed -- but if the ledger stores that URI and supports a historical view, there is no way to delete that identifier for you. It is like carving your phone number in granite; sure, what you say on that phone line can change, but the number itself never can. And all data that you once published for it can be linked to the current version, even if that data is no longer visible. You have no "please delete this" recourse.

csuwildcat commented 3 years ago

@agropper apps like LinkedIn can be designed to crawl for resumes and other public info people/companies intentionally expose to the world, and present that information in whatever UI they believe people will like to use. They could also build features in to 'follow' the DIDs of people/companies you are specifically interested in, which, inside the app, would just mean that the app goes to check that DID's PDS more often, and presents the data more prominently to you as the user. The beauty of this is that now any decent app developer can write their own LinkedIn-style application and create a different type of career experience, without being blocked by some centralized network silo.