w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.22k stars 77 forks source link

Usernames in "id" URIs used in example JSON #342

Open trwnh opened 5 years ago

trwnh commented 5 years ago

Usernames

See also: What to leave out of URIs: Authors name

Objects on a user's namespace

Proposed changes

  1. Include examples that assign less fragile id. For example, instead of https://example.com/peeps/john, use https://example.com/users/342748903721044 or something similar. Instead of https://example.org/~alice/note/23, use https://example.org/note/23.

  2. Include url in more JSON examples as a human-friendly URL. It is probably a fine idea to use https://social.example/alyssa as a url as long as the accompanying example also uses something like https://social.example/users/798fy43huore8g54-f84wefbvkvjsd-f894w as the id.

  3. Include more SHOULD recommendations in the appropriate sections regarding the above two points.

Impact

It may not seem like the exact id allocation scheme matters too much, all things considered, but it should still be good practice to make as few assumptions as possible. One particular real-world consequence of implementations that do not allow username changes is that many trans people are forced to make entirely new accounts after they stop using their deadname. In such cases, the expectation is that these people are really trying to change their url, but there is not a clear distinction between id and url due to improper, fragile assumptions that are reflected in the examples given throughout the spec.

cwebber commented 5 years ago

This is a good point, though I am not sure if it is worth changing in the sense that it may make following the examples significantly harder, and understandability is a big goal. A footnote that explains the issue may be more useful.

I agree with you more than the above may indicate; in fact, since I am also advocating for actors which can be hosted over tor .onion services or on mutable datashards, we should not have the expectation at all that any kind of human meaning at all can be extrapolated from URIs.

That said, it can be tough in the interest of clarity to take too many steps away from user expectations or what is possible to conceptually follow in an overview section like this....

trwnh commented 5 years ago

I think a footnote to 3.1 Object Identifiers would be a good idea (and would probably satisfy proposal 3 above).

I also think for the overview section, it's OK to keep id simple and assume that this id/url distinction isn't as important, but in the later examples (particularly the ones that try to show multiple different URI allocations, such as /username, ~username, and /peeps/username), it wouldn't be a bad idea to just add 1-2 more examples in the to array, e.g. in Example 7.

Non-actor id should probably be changed throughout, though. Example 8 is the only example that uses a non-fragile id in the form http://postparty.example/p/2415.

nightpool commented 5 years ago

I'm not sure i'm entirely sold on the logic above. I know best practices for URIs say that they shouldn't change, but in practice people value readable URLs more then they value long-lived identifiers

I know current implementations assume that usernames and also URIs can't change, but why is that an assumption we should build into the spec? Nothing in activitypub implies that the actor is long-lived or immutable. If I change my username i might not want to be easily associated with my old account—for example, this happens a lot on Tumblr, where people change their username or re-create their blogs regularly.

Why build immutability into a spec that doesn't need it?

trwnh commented 5 years ago

I know best practices for URIs say that they shouldn't change, but in practice people value readable URLs more then they value long-lived identifiers

Then use url for this. The value in having a machine-friendly id be separate from a human-friendly url allows users to never have to know their identifier, while still being able to refer to things stably.

I know current implementations assume that usernames and also URIs can't change, but why is that an assumption we should build into the spec?

The spec does not assume that usernames can't change. However, ~it does mandate that id remain constant, because~ id is meant to be referenced in linked-data applications. It is because of this that id should not rely on any other information (aside from DNS authority, which currently serves as namespacing).

If I change my username i might not want to be easily associated with my old account—for example, this happens a lot on Tumblr, where people change their username or re-create their blogs regularly.

If it's the same blog but renamed, it should have the same id. If you create a new blog, it should have a new id assigned. The url can and should be changeable. Old URLs can either redirect or become unresolvable if desired, with the new URL being shared amongst humans. In essence, the url should only indicate the current location, as a pointer to the id.

Nothing in activitypub implies that the actor is long-lived or immutable [...] Why build immutability into a spec that doesn't need it?

Objects are only currently as long-lived as the domain that hosts them. But often, they are less-lived due to fragility in assumptions of the software that powers that domain. I don't think that's building immutability into the spec; it's just good practice. You could use the Move activity to express when an actor has moved an object from origin to target, but assigning non-fragile id prevents this from being necessary.

nightpool commented 5 years ago

The spec does not assume that usernames can't change. However, it does mandate that id remain constant, because id is meant to be referenced in linked-data applications.

where? I don't see any authoritative language in the spec claiming this.

trwnh commented 5 years ago

my mistake, edited my previous comment to be worded better.

what i meant to say was that if you change the id, then doing a GET might suddenly result in a 404 even though the object still exists; it was just moved in effect. this is normally not a huge problem with URLs, but it does mean you will end up with broken references and have to update a lot of old AS2 documents that reference the old id. it's basically like a null pointer.

nightpool commented 5 years ago

what i meant to say was that if you change the id, then doing a GET might suddenly result in a 404 even though the object still exists; it was just moved in effect

You're making incorrect assumptions both about the space of possible implementations and the desired user behavior. It's trivial to record a list of old usernames and provide 302 redirects to the new content. Or, conversely, if the user wishes to sever old links to their content but still keep it around (which, as I said, is a user story i see a lot on tumblr), the user could choose to have the software not put that redirect in place and not update old federated documents.

nightpool commented 5 years ago

And, again, as long as we're using DNS (and datashards aren't ready yet, I don't expect the bulk of activitypub to stop using DNS in the near future) we're going to need to be able to change ids to escape host fragility. This is a discussion well-covered by https://github.com/swicg/general/issues/1. It makes no sense, in a DNS-based world, to pretend like URIs could ever be immutable or long-lived. Given that we have to write code that works with mutable URIs anyway, I see no reason to make our documentation less accessible based on a "best practice" that doesn't even apply.

trwnh commented 5 years ago

To be clear, I'm not saying that id cannot change, I'm saying that it should change as little as possible. Yes, it's trivial to 302 (or not) from an old URI to a new URI. But why do this when you can avoid it? If I have 100,000 posts each with an id containing my username, and I wish to change my username, that's 100,000 redirects. If those post ids don't contain any usernames, it's 0 redirects.

And fwiw, if you use non-fragile IDs, you can still break old URLs while not breaking old IDs. And you can still create an entirely new ID. What you can't do is easily and freely change usernames without cost. At the end of the day, all I'm really suggesting here is to add a brief paragraph under Section 3.1 and maybe two more sample actors in Example 7.

markcellus commented 2 years ago

Couldn't the server just map the username in the id to a const identifier in database?

So id of http://mydomain.com/user/amy would map to user 1234 (user id) in database? Then that way, if Amy changes her name to bob on the client, http://mydomain.com/user/bob would then map to 1234? And for completeness, http://mydomain.com/user/amy can be updated to redirect to http://mydomain.com/user/bob?

Seems like an implementation detail left up to the implementer, imo.

wilkie commented 2 years ago

A little weird to me to have two identifiers point to the same actor (which would be seen as two different actors, off-server, even if they resolve to the same feed... and the entries in the feed point to perhaps a different actor... confusing,) but still absolutely acceptable... changing your username would create, logically, a new actor when implementing it this way. Then you map historic usernames to the new names in your routes for nodes/relays that don't understand to switch.

There are drawbacks... you then can't allow anybody to reuse a handle unless they were the original owner of that handle. Which means it could allow an attack vector where bad actors keep changing their name to exhaust the pool which clients then have to potentially worry about / rate-limit. Which is indeed another reason why you don't usually want named identifiers over any kind of proper unique identifier. Only really an issue on open, public instances, to be fair.

Clients might allow a person to follow both actors and either receive double posts or posts in a feed of one actor but each post inside is by the new actor depending on how the client will interpret the fields... so clients may need extra logic to reconcile such actors (e.g. it sees a 302 happen to an existing id... treating the "real" id as the end result of the redirect... which presumes that id is always a resolvable URL... of a particular scheme... which feels bad... because it shouldn't have to be)... which is what a unique identifier is supposed to do trivially in the first place. The client pressure for robustly allowing mutating actor identifiers seems heavy... feels best that servers avoid mutating them and clients assume they won't change and just accept not handling this gracefully.

And if you want the most heretical turn of this discussion, the id of an actor should not be identifiable to the originating domain. It doesn't need to resolve via HTTPS (ActivityPub only requires they be dereferencable! Lovely and vague.). It doesn't need to match the domain of the id of any Activity. You can gossip any resolution of the actor via normal discovery... so, if you follow the actor from a domain... you can ask that domain about the actor and get a response. If you receive a post from the actor from a domain... same deal, etc. Then your actor can migrate or have a presence on multiple federating systems. Querying an actor by its whole id (i.e. when the scheme is not https://) is just not described by ActivityPub, which presumes you already have this mechanism somewhere, and at worst implies by example/omission id is somehow always a URI serving JSON via HTTPS GET. This will likely make any extension providing global auth (probably with id starting with apactor:// or the like) a little more difficult.

wilkie commented 2 years ago

All that to say... the spec is good. No changes absolutely necessary. I disagree with adding any "SHOULD" to the language around identifiers and 100% accept the soft 'may' and 'should' where they are in the normative spec around Object and Actor resolution via URI via id. It is that way for this very reason! The user stories need legible URIs, so they are ok as they are as well. Whoever writes the migrating actor extension user stories can deal with this problem. 😂

markcellus commented 2 years ago

Yeah there are definitely caveats with the approach. Having feeds with both users would be confusing. So it would require updating the old user to the new user everywhere 😬.

But the approach isn't uncommon. Github uses a similar mechanism when you change your username. They keep the old URL as a redirect temporarily but they dont go back and update references to the old username though. So, either way, it can get messy.

But you're probably right. May be better to leave those details up to implementers. 👍

evanp commented 5 months ago

This seems like a worthwhile note to have in the document itself. I'd like to suggest the following changes:

  1. A page in the Primer with best practices for creating ID URLs. https://www.w3.org/wiki/ActivityPub/Best_practices_for_id_property
  2. In the next version of the AP spec, we should have a mix of username/non-username IDs, and call out the issue with changing IDs being impractical.

I'm also distressed at the mention of "the actor's namespace", which suggests that there's some kind of ID namespace, directory structure, etc. It would be a big mistake for an AP processor to try to inspect two IDs to determine the relationship between them. I'm opening a separate issue for this.

evanp commented 5 months ago

See #442 for more discussion of "the actor's namespace"

trwnh commented 1 day ago

This has been addressed in the Primer at https://www.w3.org/wiki/ActivityPub/Primer/Object_identifiers#ID_stability so the "needs primer" tag can probably be removed

bobwyman commented 1 day ago

Usernames in IDs are not inherently evil -- as long as the ID contains more than the Username.

Consider the use of tag URI's as defined in RFC 4151. By adding to the ID a specification of the date or time that it was created, one can create ids that are essentially unique in all time, even though they may be very simple in form and easily understood. For instance:

Below are some tag URIs that contain user names and domain name, yet, will remain unique across all time, even if the assignment of the user or domain names changes in the future. Rules either identical to, or similar to, those defined for tag URIs would be useful in the Activity* space.

Note: I wrote about this issue back in 2006. See: https://web.archive.org/web/20160331154849/https://wyman.us/main/2006/12/the_persistence.html

trwnh commented 1 day ago

Usernames in IDs are not inherently evil -- as long as the ID contains more than the Username.

the point of contention is that anything in an id should ideally never change. if you include a username in an ID, then you are betting that the username will never change. but there are several cases where a username might in fact change.