Closed snarfed closed 5 months ago
...although, now that I think about it, we should just bridge the original post too, right? At least if its author is opted in.
I think there are a few different cases to consider here: (I'll just fill in the table according to what I would do.)
orig. circumstance | self-boost | local boost | boost of bridged |
---|---|---|---|
no opt-in | n.a. | bridge only boost, with dangling reference² | reject and blindly bounce back deletion³ |
was and is opted-in | bridge boost (post alr. avail.) | bridge boost (post alr. avail.) | bridge boost (post alr. avail. at retranslated URL) |
wasn't opted-in, but is now | bridge post, then boost¹ | bridge only boost, with dangling reference² | bridge only boost, with dangling retranslated reference² |
¹ I would interpret this as clear intention to give it more reach. (Example: An artist wants to show old art to new followers, but also to avoid duplicates. They boost their original post.)
² The post may have been authored without taking the additional context into account.
(Benign-ish example: "Mastodon is for nerds!", later reconsiders and opts in, then someone tries to start drama by boosting that post onto the Fediverse.)
Protocol-wise, I think dangling links are just fine, since they happen naturally anyway. I'm not aware of software that currently does this, but it would be possible to present a tombstone with HTML link to the user where a resource is missing. You could (probably should, also considering the "view in browser" button in most AP clients) blindly redirect to the web representation if you see a request for a resource that prefers HTML.
³ This is a possible failure mode of the protocols, where a cache became stale across an opt-out for whatever reason. You could also see this in the form of incoming likes (and replies, but I think those should be bridged rather than rejected).
Since the target state is removal of all bridged content, and you don't hold or process data about the original author at this point (anymore), I think the best course of action is to send a synthetic deletion/undo-create towards from where you saw the stale reference. You may also have to blindly respond to fetches of undo-creates regarding not-opted-in users' content to deal with instances that don't rely on signatures to authenticate objects, and to reply with 410 Gone
to respective content fetches.
Interesting! Comprehensive design ideas. Thanks for writing them up!
I like the idea of drawing a bright line at the point in time when someone opts in. I already plan to do that by only bridging their posts going forward, not retroactively. I don't know if the v1 implementation will also consider it when other people repost their old posts, but it's a good idea.
If by "dangling link" you mean bridge the repost but not the original post, that's not really possible in most social networks or protocols I know of, at least not with "native" reposts. If I deliver an AP Announce
with an object
that's not fetchable as AS2 via AP, I expect most/all fediverse servers will drop that and not try to render it. I definitely know Bluesky will drop it.
I could instead render it manually, as a normal post with some kind of extra text indicating that it's a repost and the original post is unavailable, or with a link to click through to see it, but I generally try to avoid alternative or "extra" text-based UI like that.
I am curious how the current native fediverse handles reposts when the original post itself gets deleted. I assume reposts are hidden, but I'm not sure. Do you know what different fediverse servers do?
I am curious how the current native fediverse handles reposts when the original post itself gets deleted. I assume reposts are hidden, but I'm not sure. Do you know what different fediverse servers do?
I think the protocol level answer is that the Announce
activity still exists, and its object is replaced with a Tombstone
object. But I don't know what happens at the UX level.
That's where it gets really messy :) In my opinion in a relatively benign way that doesn't cause more work for a well-behaved sustainable service though, at least, however it may look in detail.
The short answer is that most AP software will afaik in fact remove the content and boost completely from the UX without indication of missing content. (If you want to save bandwidth, then don't expose or at least don't push boosts that you know are dangling - filtering like that is well within spec.)
The long answer is that most AP software touches two to three, maybe four different protocols and/or representations each and there's a broad range of allowed behaviour. I don't think Tombstone
s are used by Mastodon at all though.
Essentially the typical server layers/protocols are:
Much of this is conjecture from e.g. bug reports, because I did not read the code or much specification, but aside from pure AP applications (like relays and multiplexers), application software will generally decompose incoming objects to keep a much-abridged representation in its database, then operate on that. I think there's a mechanism for replaying activities towards others if you kept enough to reconstruct the activity and signature in some form too, though.
Let's say an application receives an authentic "delete thing" activity via server-to-server AP, where thing has identity in each of server-to-server-AP, the internal representation and the client protocol(s). The possible behaviour can be categorised as follows, I think:
Expected behaviour This is what admins would generally expect to consider an app well-behaved.
Purge thing and all its directly attached data (like post content for statuses), excluding what's strictly necessary for operation, i.e. I'm sure you could have abuse report functionality snap a copy of some data that's not for unprivileged eyes and keep some aggregate metrics to recognise protocol spam.
Record thing's opaque ID as deleted to make the deletion durable, at least for a reasonable time (> 1 month? <= 6 months?). "Things" that can be "restored" identically (like boosts, likes, follows) are instead created as new under a newly allocated ID.
Make a best effort to undo automation that resulted from "create thing". (i.e. undo bridging, tick down aggregate counts, detach subscription...)
Optional behaviour Server software could do any number of these or none of them. It may even opt for options that seem contradictory, by staggering them across time or between client APIs/protocols.
Encouraged behaviour This is something that an instance should do, but that it (as far as outside observations over server-to-server-AP go) effectively can't entirely guarantee due to race conditions in federation.
One thing that's clear though is that servers must handle inconsistencies in server-to-server AP state gracefully, as that's something that can just happen randomly due to delays and desyncs, and the network state is not guaranteed to have a stepwise-consistent possible order of activities. So while you're free to filter out dangling references (to save performance/data transfer volume), you're equally free to broadcast them to save on database queries and complexity. (I am completely ignorant about the tradeoffs there, in terms of what's economical.)
There's actually a third option you could implement, which is to not push dangling objects but expose them on fetch, for example in the collection of activities by a given actor. This may behave better with software like Akkoma that unlike Mastodon does fairly thorough backfilling, and as far as I know is able to hold inconsistent state in its internal representation. (The latter may also be true of Mastodon, as mentioned before I just never checked what that does internally.)
Thanks! Great thinking and sleuthing.
One thing that's clear though is that servers must handle inconsistencies in server-to-server AP state gracefully...So while you're free to filter out dangling references (to save performance/data transfer volume), you're equally free to broadcast them to save on database queries and complexity.
There's actually a third option you could implement, which is to not push dangling objects but expose them on fetch, for example in the collection of activities by a given actor.
Kind of, but not really. Most protocols don't give us a choice between push and pull. IndieWeb is pull with thin pings (eg webmention), ATProto and Nostr are both push. AP is the only one that realistically lets us choose in some situations, and even then, it's mainly just the difference between setting object
to an actual object vs a bare id in Announce
and other similar activities, eg Like
s.
And even then, we don't really get even that choice if we want wide interop. We started out putting full objects in object
and inReplyTo
, but eventually had to switch to bare ids for interop because many other servers (besides Mastodon) crashed on full objects. See many of the older "support X" issues here, where X is other fediverse servers.
I meant that more as "expose the dangling Anncounce
if an Actor
's "outbox"
or the Announce
itself is fetched, but don't push it to followers' "inbox"
or "sharedInbox"
eagerly", rather than inline objects vs. IDs.
Better to be consistent across networks though, yes. I assume it's considered misbehaviour in ATProto to have objects change in the repository without streaming that change, since iirc the event stream there isn't sparse.
And even then, we don't really get even that choice if we want wide interop. We started out putting full objects in
object
andinReplyTo
, but eventually had to switch to bare ids for interop because many other servers (besides Mastodon) crashed on full objects. See many of the older "support X" issues here, where X is other fediverse servers.
This is one thing I really wish was documented better for AP/implementations. It's too difficult to find (clear) documentation on what is sent, and often downright impossible to find documentation on what is accepted.
I'm hopeful that dansup's upcoming https://pubkit.net will improve that situation at least somewhat, though, even if it risks becoming somewhat of an authority on the community standard.
The basic first pass here is complete: reposts are bridged if they're reposting an opted in account or a bridged account, but not otherwise. https://mastodonsweden.se/@doktorzjivago 's timeline right now has examples of both cases. (Really fun to see!)
I don't have immediate plans to pursue "dangling" references as discussed for, since protocol support for them is either missing or incomplete, but the details here will be hugely valuable in case we ever try in the future. Thanks again!
I reposted a Bluesky post just now, and it got delivered to my fediverse followers too. They couldn't fetch the original post via ActivityPub, so they ignored it, but still, we should only deliver reposts to the same protocol as the original post.