snarfed / bridgy-fed

🌉 A bridge between decentralized social network protocols
https://fed.brid.gy
Creative Commons Zero v1.0 Universal
647 stars 31 forks source link

reposts depend on whether the original post's author is opted in #937

Closed snarfed closed 5 months ago

snarfed commented 7 months ago

I reposted a Bluesky post just now, and it got delivered to my fediverse followers too. They couldn't fetch the original post via ActivityPub, so they ignored it, but still, we should only deliver reposts to the same protocol as the original post.

snarfed commented 7 months ago

Log: https://fed.brid.gy/log?path=%2Fqueue%2Fwebmention%2C%2Finbox&start_time=1712369007&key=https%3A%2F%2Fsnarfed.org%2F2024-04-05_52735

snarfed commented 7 months ago

...although, now that I think about it, we should just bridge the original post too, right? At least if its author is opted in.

qazmlp commented 7 months ago

I think there are a few different cases to consider here: (I'll just fill in the table according to what I would do.)

orig. circumstance self-boost local boost boost of bridged
no opt-in n.a. bridge only boost, with dangling reference² reject and blindly bounce back deletion³
was and is opted-in bridge boost (post alr. avail.) bridge boost (post alr. avail.) bridge boost (post alr. avail. at retranslated URL)
wasn't opted-in, but is now bridge post, then boost¹ bridge only boost, with dangling reference² bridge only boost, with dangling retranslated reference²

¹ I would interpret this as clear intention to give it more reach. (Example: An artist wants to show old art to new followers, but also to avoid duplicates. They boost their original post.)

² The post may have been authored without taking the additional context into account.
(Benign-ish example: "Mastodon is for nerds!", later reconsiders and opts in, then someone tries to start drama by boosting that post onto the Fediverse.)

Protocol-wise, I think dangling links are just fine, since they happen naturally anyway. I'm not aware of software that currently does this, but it would be possible to present a tombstone with HTML link to the user where a resource is missing. You could (probably should, also considering the "view in browser" button in most AP clients) blindly redirect to the web representation if you see a request for a resource that prefers HTML.

³ This is a possible failure mode of the protocols, where a cache became stale across an opt-out for whatever reason. You could also see this in the form of incoming likes (and replies, but I think those should be bridged rather than rejected). Since the target state is removal of all bridged content, and you don't hold or process data about the original author at this point (anymore), I think the best course of action is to send a synthetic deletion/undo-create towards from where you saw the stale reference. You may also have to blindly respond to fetches of undo-creates regarding not-opted-in users' content to deal with instances that don't rely on signatures to authenticate objects, and to reply with 410 Gone to respective content fetches.

snarfed commented 7 months ago

Interesting! Comprehensive design ideas. Thanks for writing them up!

I like the idea of drawing a bright line at the point in time when someone opts in. I already plan to do that by only bridging their posts going forward, not retroactively. I don't know if the v1 implementation will also consider it when other people repost their old posts, but it's a good idea.

If by "dangling link" you mean bridge the repost but not the original post, that's not really possible in most social networks or protocols I know of, at least not with "native" reposts. If I deliver an AP Announce with an object that's not fetchable as AS2 via AP, I expect most/all fediverse servers will drop that and not try to render it. I definitely know Bluesky will drop it.

I could instead render it manually, as a normal post with some kind of extra text indicating that it's a repost and the original post is unavailable, or with a link to click through to see it, but I generally try to avoid alternative or "extra" text-based UI like that.

I am curious how the current native fediverse handles reposts when the original post itself gets deleted. I assume reposts are hidden, but I'm not sure. Do you know what different fediverse servers do?

snarfed commented 7 months ago

I am curious how the current native fediverse handles reposts when the original post itself gets deleted. I assume reposts are hidden, but I'm not sure. Do you know what different fediverse servers do?

I think the protocol level answer is that the Announce activity still exists, and its object is replaced with a Tombstone object. But I don't know what happens at the UX level.

qazmlp commented 7 months ago

That's where it gets really messy :) In my opinion in a relatively benign way that doesn't cause more work for a well-behaved sustainable service though, at least, however it may look in detail.

The short answer is that most AP software will afaik in fact remove the content and boost completely from the UX without indication of missing content. (If you want to save bandwidth, then don't expose or at least don't push boosts that you know are dangling - filtering like that is well within spec.)


The long answer is that most AP software touches two to three, maybe four different protocols and/or representations each and there's a broad range of allowed behaviour. I don't think Tombstones are used by Mastodon at all though.

Essentially the typical server layers/protocols are:

Much of this is conjecture from e.g. bug reports, because I did not read the code or much specification, but aside from pure AP applications (like relays and multiplexers), application software will generally decompose incoming objects to keep a much-abridged representation in its database, then operate on that. I think there's a mechanism for replaying activities towards others if you kept enough to reconstruct the activity and signature in some form too, though.

Let's say an application receives an authentic "delete thing" activity via server-to-server AP, where thing has identity in each of server-to-server-AP, the internal representation and the client protocol(s). The possible behaviour can be categorised as follows, I think:

  1. Expected behaviour This is what admins would generally expect to consider an app well-behaved.

    • Purge thing and all its directly attached data (like post content for statuses), excluding what's strictly necessary for operation, i.e. I'm sure you could have abuse report functionality snap a copy of some data that's not for unprivileged eyes and keep some aggregate metrics to recognise protocol spam.

    • Record thing's opaque ID as deleted to make the deletion durable, at least for a reasonable time (> 1 month? <= 6 months?). "Things" that can be "restored" identically (like boosts, likes, follows) are instead created as new under a newly allocated ID.

    • Make a best effort to undo automation that resulted from "create thing". (i.e. undo bridging, tick down aggregate counts, detach subscription...)

  2. Optional behaviour Server software could do any number of these or none of them. It may even opt for options that seem contradictory, by staggering them across time or between client APIs/protocols.

    • Turn dependent objects (or collection entries) internally into tombstones, e.g. by erasing the ID of the deleted pointee from them.
    • Purge dependent objects (or entries) entirely.
    • Notify client(s) to purge thing and/or dependent objects eagerly.
    • Cascade the deletion by broadcasting deletions for its owned thing-dependent objects.
    • Periodically vacuum its database to do any of the above in this category asynchronously.
    • Expose tombstones to clients. (This notably isn't possible in the Mastodon API, but possible towards AP-clients.)
    • Any other reasonable automation.
  3. Encouraged behaviour This is something that an instance should do, but that it (as far as outside observations over server-to-server-AP go) effectively can't entirely guarantee due to race conditions in federation.

    • Not create new dependent objects of thing.
    • Reject client actions that would create dependent objects. (Pretty feasible with the Mastodon API since that's synchronous CRUD, but may not be feasible with AP clients!)

One thing that's clear though is that servers must handle inconsistencies in server-to-server AP state gracefully, as that's something that can just happen randomly due to delays and desyncs, and the network state is not guaranteed to have a stepwise-consistent possible order of activities. So while you're free to filter out dangling references (to save performance/data transfer volume), you're equally free to broadcast them to save on database queries and complexity. (I am completely ignorant about the tradeoffs there, in terms of what's economical.)

There's actually a third option you could implement, which is to not push dangling objects but expose them on fetch, for example in the collection of activities by a given actor. This may behave better with software like Akkoma that unlike Mastodon does fairly thorough backfilling, and as far as I know is able to hold inconsistent state in its internal representation. (The latter may also be true of Mastodon, as mentioned before I just never checked what that does internally.)

snarfed commented 7 months ago

Thanks! Great thinking and sleuthing.

One thing that's clear though is that servers must handle inconsistencies in server-to-server AP state gracefully...So while you're free to filter out dangling references (to save performance/data transfer volume), you're equally free to broadcast them to save on database queries and complexity.

There's actually a third option you could implement, which is to not push dangling objects but expose them on fetch, for example in the collection of activities by a given actor.

Kind of, but not really. Most protocols don't give us a choice between push and pull. IndieWeb is pull with thin pings (eg webmention), ATProto and Nostr are both push. AP is the only one that realistically lets us choose in some situations, and even then, it's mainly just the difference between setting object to an actual object vs a bare id in Announce and other similar activities, eg Likes.

And even then, we don't really get even that choice if we want wide interop. We started out putting full objects in object and inReplyTo, but eventually had to switch to bare ids for interop because many other servers (besides Mastodon) crashed on full objects. See many of the older "support X" issues here, where X is other fediverse servers.

qazmlp commented 7 months ago

I meant that more as "expose the dangling Anncounce if an Actor's "outbox" or the Announce itself is fetched, but don't push it to followers' "inbox" or "sharedInbox" eagerly", rather than inline objects vs. IDs.

Better to be consistent across networks though, yes. I assume it's considered misbehaviour in ATProto to have objects change in the repository without streaming that change, since iirc the event stream there isn't sparse.

And even then, we don't really get even that choice if we want wide interop. We started out putting full objects in object and inReplyTo, but eventually had to switch to bare ids for interop because many other servers (besides Mastodon) crashed on full objects. See many of the older "support X" issues here, where X is other fediverse servers.

This is one thing I really wish was documented better for AP/implementations. It's too difficult to find (clear) documentation on what is sent, and often downright impossible to find documentation on what is accepted.

I'm hopeful that dansup's upcoming https://pubkit.net will improve that situation at least somewhat, though, even if it risks becoming somewhat of an authority on the community standard.

snarfed commented 5 months ago

The basic first pass here is complete: reposts are bridged if they're reposting an opted in account or a bridged account, but not otherwise. https://mastodonsweden.se/@doktorzjivago 's timeline right now has examples of both cases. (Really fun to see!)

I don't have immediate plans to pursue "dangling" references as discussed for, since protocol support for them is either missing or incomplete, but the details here will be hugely valuable in case we ever try in the future. Thanks again!