Open trwnh opened 5 years ago
This is a good point, though I am not sure if it is worth changing in the sense that it may make following the examples significantly harder, and understandability is a big goal. A footnote that explains the issue may be more useful.
I agree with you more than the above may indicate; in fact, since I am also advocating for actors which can be hosted over tor .onion
services or on mutable datashards, we should not have the expectation at all that any kind of human meaning at all can be extrapolated from URIs.
That said, it can be tough in the interest of clarity to take too many steps away from user expectations or what is possible to conceptually follow in an overview section like this....
I think a footnote to 3.1 Object Identifiers would be a good idea (and would probably satisfy proposal 3 above).
I also think for the overview section, it's OK to keep id
simple and assume that this id
/url
distinction isn't as important, but in the later examples (particularly the ones that try to show multiple different URI allocations, such as /username
, ~username
, and /peeps/username
), it wouldn't be a bad idea to just add 1-2 more examples in the to
array, e.g. in Example 7.
Non-actor id
should probably be changed throughout, though. Example 8 is the only example that uses a non-fragile id
in the form http://postparty.example/p/2415
.
I'm not sure i'm entirely sold on the logic above. I know best practices for URIs say that they shouldn't change, but in practice people value readable URLs more then they value long-lived identifiers
I know current implementations assume that usernames and also URIs can't change, but why is that an assumption we should build into the spec? Nothing in activitypub implies that the actor is long-lived or immutable. If I change my username i might not want to be easily associated with my old account—for example, this happens a lot on Tumblr, where people change their username or re-create their blogs regularly.
Why build immutability into a spec that doesn't need it?
I know best practices for URIs say that they shouldn't change, but in practice people value readable URLs more then they value long-lived identifiers
Then use url
for this. The value in having a machine-friendly id
be separate from a human-friendly url
allows users to never have to know their identifier, while still being able to refer to things stably.
I know current implementations assume that usernames and also URIs can't change, but why is that an assumption we should build into the spec?
The spec does not assume that usernames can't change. However, ~it does mandate that id
remain constant, because~ id
is meant to be referenced in linked-data applications. It is because of this that id
should not rely on any other information (aside from DNS authority, which currently serves as namespacing).
If I change my username i might not want to be easily associated with my old account—for example, this happens a lot on Tumblr, where people change their username or re-create their blogs regularly.
If it's the same blog but renamed, it should have the same id. If you create a new blog, it should have a new id
assigned. The url
can and should be changeable. Old URLs can either redirect or become unresolvable if desired, with the new URL being shared amongst humans. In essence, the url
should only indicate the current location, as a pointer to the id
.
Nothing in activitypub implies that the actor is long-lived or immutable [...] Why build immutability into a spec that doesn't need it?
Objects are only currently as long-lived as the domain that hosts them. But often, they are less-lived due to fragility in assumptions of the software that powers that domain. I don't think that's building immutability into the spec; it's just good practice. You could use the Move
activity to express when an actor
has moved an object
from origin
to target
, but assigning non-fragile id
prevents this from being necessary.
The spec does not assume that usernames can't change. However, it does mandate that id remain constant, because id is meant to be referenced in linked-data applications.
where? I don't see any authoritative language in the spec claiming this.
my mistake, edited my previous comment to be worded better.
what i meant to say was that if you change the id
, then doing a GET might suddenly result in a 404 even though the object still exists; it was just moved in effect. this is normally not a huge problem with URLs, but it does mean you will end up with broken references and have to update a lot of old AS2 documents that reference the old id
. it's basically like a null pointer.
what i meant to say was that if you change the id, then doing a GET might suddenly result in a 404 even though the object still exists; it was just moved in effect
You're making incorrect assumptions both about the space of possible implementations and the desired user behavior. It's trivial to record a list of old usernames and provide 302 redirects to the new content. Or, conversely, if the user wishes to sever old links to their content but still keep it around (which, as I said, is a user story i see a lot on tumblr), the user could choose to have the software not put that redirect in place and not update old federated documents.
And, again, as long as we're using DNS (and datashards aren't ready yet, I don't expect the bulk of activitypub to stop using DNS in the near future) we're going to need to be able to change id
s to escape host fragility. This is a discussion well-covered by https://github.com/swicg/general/issues/1. It makes no sense, in a DNS-based world, to pretend like URIs could ever be immutable or long-lived. Given that we have to write code that works with mutable URIs anyway, I see no reason to make our documentation less accessible based on a "best practice" that doesn't even apply.
To be clear, I'm not saying that id
cannot change, I'm saying that it should change as little as possible. Yes, it's trivial to 302 (or not) from an old URI to a new URI. But why do this when you can avoid it? If I have 100,000 posts each with an id
containing my username, and I wish to change my username, that's 100,000 redirects. If those post ids
don't contain any usernames, it's 0 redirects.
And fwiw, if you use non-fragile IDs, you can still break old URLs while not breaking old IDs. And you can still create an entirely new ID. What you can't do is easily and freely change usernames without cost. At the end of the day, all I'm really suggesting here is to add a brief paragraph under Section 3.1 and maybe two more sample actors in Example 7.
Couldn't the server just map the username in the id
to a const identifier in database?
So id
of http://mydomain.com/user/amy
would map to user 1234
(user id) in database? Then that way, if Amy changes her name to bob
on the client, http://mydomain.com/user/bob
would then map to 1234
? And for completeness, http://mydomain.com/user/amy
can be updated to redirect to http://mydomain.com/user/bob
?
Seems like an implementation detail left up to the implementer, imo.
A little weird to me to have two identifiers point to the same actor (which would be seen as two different actors, off-server, even if they resolve to the same feed... and the entries in the feed point to perhaps a different actor... confusing,) but still absolutely acceptable... changing your username would create, logically, a new actor when implementing it this way. Then you map historic usernames to the new names in your routes for nodes/relays that don't understand to switch.
There are drawbacks... you then can't allow anybody to reuse a handle unless they were the original owner of that handle. Which means it could allow an attack vector where bad actors keep changing their name to exhaust the pool which clients then have to potentially worry about / rate-limit. Which is indeed another reason why you don't usually want named identifiers over any kind of proper unique identifier. Only really an issue on open, public instances, to be fair.
Clients might allow a person to follow both actors and either receive double posts or posts in a feed of one actor but each post inside is by the new actor depending on how the client will interpret the fields... so clients may need extra logic to reconcile such actors (e.g. it sees a 302 happen to an existing id
... treating the "real" id
as the end result of the redirect... which presumes that id
is always a resolvable URL... of a particular scheme... which feels bad... because it shouldn't have to be)... which is what a unique identifier is supposed to do trivially in the first place. The client pressure for robustly allowing mutating actor identifiers seems heavy... feels best that servers avoid mutating them and clients assume they won't change and just accept not handling this gracefully.
And if you want the most heretical turn of this discussion, the id
of an actor should not be identifiable to the originating domain. It doesn't need to resolve via HTTPS (ActivityPub only requires they be dereferencable! Lovely and vague.). It doesn't need to match the domain of the id
of any Activity. You can gossip any resolution of the actor via normal discovery... so, if you follow the actor from a domain... you can ask that domain about the actor and get a response. If you receive a post from the actor from a domain... same deal, etc. Then your actor can migrate or have a presence on multiple federating systems. Querying an actor by its whole id
(i.e. when the scheme is not https://
) is just not described by ActivityPub, which presumes you already have this mechanism somewhere, and at worst implies by example/omission id
is somehow always a URI serving JSON via HTTPS GET. This will likely make any extension providing global auth (probably with id
starting with apactor://
or the like) a little more difficult.
All that to say... the spec is good. No changes absolutely necessary. I disagree with adding any "SHOULD" to the language around identifiers and 100% accept the soft 'may' and 'should' where they are in the normative spec around Object and Actor resolution via URI via id
. It is that way for this very reason! The user stories need legible URIs, so they are ok as they are as well. Whoever writes the migrating actor extension user stories can deal with this problem. 😂
Yeah there are definitely caveats with the approach. Having feeds with both users would be confusing. So it would require updating the old user to the new user everywhere 😬.
But the approach isn't uncommon. Github uses a similar mechanism when you change your username. They keep the old URL as a redirect temporarily but they dont go back and update references to the old username though. So, either way, it can get messy.
But you're probably right. May be better to leave those details up to implementers. 👍
This seems like a worthwhile note to have in the document itself. I'd like to suggest the following changes:
I'm also distressed at the mention of "the actor's namespace", which suggests that there's some kind of ID namespace, directory structure, etc. It would be a big mistake for an AP processor to try to inspect two IDs to determine the relationship between them. I'm opening a separate issue for this.
See #442 for more discussion of "the actor's namespace"
This has been addressed in the Primer at https://www.w3.org/wiki/ActivityPub/Primer/Object_identifiers#ID_stability so the "needs primer" tag can probably be removed
Usernames in IDs are not inherently evil -- as long as the ID contains more than the Username.
Consider the use of tag URI's as defined in RFC 4151. By adding to the ID a specification of the date or time that it was created, one can create ids that are essentially unique in all time, even though they may be very simple in form and easily understood. For instance:
Below are some tag URIs that contain user names and domain name, yet, will remain unique across all time, even if the assignment of the user or domain names changes in the future. Rules either identical to, or similar to, those defined for tag URIs would be useful in the Activity* space.
Note: I wrote about this issue back in 2006. See: https://web.archive.org/web/20160331154849/https://wyman.us/main/2006/12/the_persistence.html
Usernames in IDs are not inherently evil -- as long as the ID contains more than the Username.
the point of contention is that anything in an id should ideally never change. if you include a username in an ID, then you are betting that the username will never change. but there are several cases where a username might in fact change.
Usernames
id
as a URI should not changeid
id
will breakSee also: What to leave out of URIs: Authors name
Objects on a user's namespace
id
as a URI should not changeid
as subdirectory of actor id, and 3.1 Object Identifiers usesSHOULD
language to describe allocating an object ID in the actor's namespace for C2SProposed changes
Include examples that assign less fragile
id
. For example, instead ofhttps://example.com/peeps/john
, usehttps://example.com/users/342748903721044
or something similar. Instead ofhttps://example.org/~alice/note/23
, usehttps://example.org/note/23
.Include
url
in more JSON examples as a human-friendly URL. It is probably a fine idea to usehttps://social.example/alyssa
as aurl
as long as the accompanying example also uses something likehttps://social.example/users/798fy43huore8g54-f84wefbvkvjsd-f894w
as theid
.Include more
SHOULD
recommendations in the appropriate sections regarding the above two points.Impact
It may not seem like the exact
id
allocation scheme matters too much, all things considered, but it should still be good practice to make as few assumptions as possible. One particular real-world consequence of implementations that do not allow username changes is that many trans people are forced to make entirely new accounts after they stop using their deadname. In such cases, the expectation is that these people are really trying to change theirurl
, but there is not a clear distinction betweenid
andurl
due to improper, fragile assumptions that are reflected in the examples given throughout the spec.