lbryio / cantina

2 stars 0 forks source link

Where things are stored #2

Open orblivion opened 2 years ago

orblivion commented 2 years ago

Types of data

Claims (streams and channels)

For data that exists on the blockchain, the blockchain will be the source of truth. This probably amounts to all claims and only claims, but there might be more. Claims include both streams and channels, posts and reposts. Cantina will only ever store a copy of these for caching purposes. In principle it comes straight from the Hub and gets transformed into ActivityPub objects. We can think of the blockchain as just one of the databases that Cantina uses (RocksDB and ElasticSearch on the hub effectively being optimized caches as well).

https://cantina.example/stream/<claim-id> https://cantina.example/channel/<claim-id>

When a stream claim is added, I think we need a Create activity, and it should be sent to the inbox of every follower of the channel.

Changes to data on the blockchain could also produce Update or Delete activities. Again the source of truth should be the blockchain. We should be able to create a hub endpoint, or access the hub database, to be able to recreate the history on the fly and translate them into ActivityPub activities. We either query from the beginning every time we need it, or as they come in, accumulating new activities in a local cache.

Since the Update and Delete activities will be generated on the fly, I'd be a bit nervous about having them go by numerical index. I think we can use txId:

https://cantina.example/stream/<claim-id>/delete/<txn-id> https://cantina.example/stream/<claim-id>/update/<txn-id>

Since Cantinas can get this information any time they want straight from the blockchain, these activities are only for the benefit of Mastodons etc, for staying up to spec per se, and maybe to nudge remote Cantinas with followers of the channel to check the hub for the changes.

One interesting question that's still unclear to me - are these activities "pushed" or "pulled"? I.e. if I'm a Mastodon user and I follow a Cantina account, and it updates a video, will Cantina POST to my Mastodon or will my Mastodon periodically poll the Cantina for updates? I hope this question can be put off until implementation. At any rate, there is bound to be inconsistency in this data between servers.

Social actions (comments, views, and more)

For data that doesn't exist on the blockchain, which includes comments and views, a database hosted on the relevant channel's Cantina server will be the source of truth. The blockchain will be the source of the public key that is used to sign the activities associated with this data.

The Activities associated with these data will originate from the Cantina of the commenter/viewer/etc, and stored there in a database as well. This includes Create, Update and Delete.

If cantina-followers.example has followers of channels on cantina-actors.example, cantina-followers.example will end up receiving objects (comments, views, etc), and associated activities, from cantina-actors.example. cantina-followers.example will keep a copy of these things locally. Hopefully this can also be a dumb cache, i.e. cantina-actors.example will fully be the source of truth for that data, and no special state on cantina-followers.example related to this. Due to missed connections, this will likely become an inaccurate cache over time.

This blog post backs up the idea that the copies on remote servers will be a cache.

Followers

I think Follow activities aren't too different from (let's say) Like activities. The new follower gets added to the local "following" collection. An activity gets sent to the followed account so that they can add the new follower to their own "followed" collection. The "followed" collection should be treated as a cache just like with Likes/Views, but again one that's going to get pretty inaccurate over time and missed connections.

Mismatched followers between servers is more consequential than mismatched Likes. If a certain user un-follows me and my Cantina server didn't get the message, my new video posts will still get sent to their server. Mastodon deals with this problem with a special synchronization process.

Reposts

Reposts use the "Announce" activity.

A channel that re-posts a video will only do so via a blockchain action via the hub. This is true even if the originating channel and the reposting channel have different Cantina servers. As stated above, the Cantina server(s) will merely echo the blockchain data in its representation of the repost.

It gets a little different with external services. A Mastodon server talking to a Cantina server won't know the difference between blockchain-native data and Cantina-native data. A Mastodon user could repost a video (blockchain-based) or a comment (not blockchain-based), and it will be treated basically the same way in their Mastodon server's database. In this particular case, the (non-blockchain) Mastodon database would be the source of truth of this repost of a claim.

Reposting comments probably makes no sense for Cantina's interface. However it does make sense for Mastodon, since it's a feed, and we wouldn't be able to stop a Mastodon server from "re-tooting" (or whatever) a Cantina comment. Whereas Likes from Mastodon will come back to us and get stored in our "likes" collection, we won't have a "shares" collection so we will probably ignore the "Announce" that comes from Mastodon. BTW this may mean that a deleted comment may end up staying alive forever, "re-tooted" on a Mastodon server, since the Cantina won't have a way to reach out to tell it that it's been deleted. (Though if the Mastodon polls the Cantina, it could find out)

Aggregates (counts)

Grin asked me to consider whether counts of actions (Likes, etc) should be stored on the server originating the action or the one receiving it. I couldn't find anything in the activitypub spec about conveying Like/Comment/Follower/etc counts per se across servers.

However, looking at the output from the Mastodon and Peertube activitypub API, on the receiving end of these activities, I see not mere numbers, but as OrderedCollections of explicitly collected activities, including activities from other servers, with an aggregate count included in the json. This includes counts of followers on Peertube and Mastodon, and likes and comments on Peertube.

My conclusion is that the servers receiving the action are holding every individual action, so clearly it's in the position to count it.

I can give some examples.

A Mastodon "following" list:

This is a "following" list linked from an Actor object:

{"@context": 'https://www.w3.org/ns/activitystreams",
 "first": 'https://mastodon.social/users/Gargron/following?page=1",
 "id": 'https://mastodon.social/users/Gargron/following",
 "totalItems": 310,
 "type": "OrderedCollection"}

It's paginated. The "first" link is where you start to see the actual items, but totalItems (the aggregate we're discussing) is already here. The first page looks like this:

{'@context': 'https://www.w3.org/ns/activitystreams',
 'id': 'https://mastodon.social/users/Gargron/following?page=1',
 'next': 'https://mastodon.social/users/Gargron/following?page=2',
 'orderedItems': ['https://bsd.network/users/ellotheth',
                  'https://masto.ashfurrow.com/users/ashfurrow',
                  'https://fosstodon.org/users/danarel',
                  'https://mastodon.sdf.org/users/taviso',
                  'https://chaos.social/users/ThomasWaldmann',
                  'https://mastodon.social/users/akryum',
                  'https://mastodon.online/users/vivaldibrowser',
                  'https://mastodon.social/users/NotFrauKadse',
                  'https://mastodon.lol/users/muffinista',
                  'https://mastodon.art/users/camilabrun',
                  'https://charade.social/users/jonathan',
                  'https://mastodon.social/users/a2_4am'],
 'partOf': 'https://mastodon.social/users/Gargron/following',
 'totalItems': 310,
 'type': 'OrderedCollectionPage'}

You can see some followers from other servers collected here.

A Peertube "likes" collection:

This is linked from a video object.

{'@context': ['https://www.w3.org/ns/activitystreams',
              'https://w3id.org/security/v1',
              {'RsaSignature2017': 'https://w3id.org/security#RsaSignature2017'}],
 'first': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/likes?page=1',
 'id': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/likes',
 'totalItems': 5,
 'type': 'OrderedCollection'}

The first (and only) page gives you:

{'@context': ['https://www.w3.org/ns/activitystreams',
              'https://w3id.org/security/v1',
              {'RsaSignature2017': 'https://w3id.org/security#RsaSignature2017'}],
 'id': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/likes?page=1',
 'orderedItems': ['https://videos.grafo.zone/accounts/grafo/likes/21187',
                  'https://tilvids.com/accounts/liira/likes/40997',
                  'https://mstdn.io/users/mikefordays#likes/1825392',
                  'https://mas.to/users/lucie#likes/1282855',
                  'https://mastodon.online/users/pennywhether#likes/2896615'],
 'partOf': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/likes',
 'totalItems': 5,
 'type': 'OrderedCollectionPage'}

So you can see the like activities coming from other servers.

A Peertube "comments" collection:

This is linked from the same video object as the likes above:

{'@context': ['https://www.w3.org/ns/activitystreams',
              'https://w3id.org/security/v1',
              {'RsaSignature2017': 'https://w3id.org/security#RsaSignature2017'}],
 'first': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments?page=1',
 'id': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments',
 'totalItems': 7,
 'type': 'OrderedCollection'}

And the first (and only) page expanded:

{'@context': ['https://www.w3.org/ns/activitystreams',
              'https://w3id.org/security/v1',
              {'RsaSignature2017': 'https://w3id.org/security#RsaSignature2017'}],
 'id': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments?page=1',
 'orderedItems': ['https://libretooth.gr/users/TheFrenchGhosty/statuses/109195503664715301',
                  'https://mastodon.online/users/pennywhether/statuses/109195597592156126',
                  'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments/14638',
                  'https://libretooth.gr/users/TheFrenchGhosty/statuses/109195711966268018',
                  'https://mstdn.io/users/mikefordays/statuses/109196263089384870',
                  'https://cheeseburger.social/users/shortwavesurfer2009/statuses/109198237963967425',
                  'https://tilvids.com/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments/18731'],
 'partOf': 'https://share.tube/videos/watch/9229655a-e676-4288-8ff1-f9c175174ec0/comments',
 'totalItems': 7,
 'type': 'OrderedCollectionPage'}

And you can see what appears to be a normal Mastodon status as a reply.

Caveat: I'm not sure how this affects "hidden" accounts doing these activities. In such a case, does the recipient server still receive those links? If not, does it somehow offer a total number of anonymized followers or likes? I thought I read something about this but I can't find anything about it anymore.

Corollary: This tells me that the "Move" activity for accounts is valuable here. Likes and such from remote servers should probably move the target of their affection to the respective object on the new service. (Though, I wonder if Mastodon goes this far. I'll have to figure that out).

The alternative would be to address likes and comments by claim ID but:

BTW: Unlike Peertube, Mastodon doesn't expose Likes on ActivityPub for whatever reason. I think it must via another API (!). https://socialhub.activitypub.rocks/t/likes-on-federated-posts/245 See also, may be unrelated: https://github.com/mastodon/mastodon/issues/3307

Channels with no Cantina server

What if you're looking at a stream connected to a channel that has no Cantina? Or what if the stream claim isn't even connected to a channel? Can you View, Like or Comment on this media?

We could have the server hosting the comment also host a placeholder Video item. The problem there is that this will be repeated on multiple Cantinas, and there isn't a clear way to collect all of the comments on the video across servers. If the Video had a designated Cantina, it would collect all of the comment activities.

Maybe we could keep it disjoint, but then if the channel (assuming there is one) ever joins Cantina, the servers will see this fact and change the comments to point to the comment hosted on that Cantina, send it the update events, and that server will then collect all of the comments in one place.

But I'm not confident that I understand how these updates really work, where to send them, who's going to listen, etc.

Related issue: if an existing Cantina goes down. Comments on other servers responding to items on that Cantina will probably just work with the latest cache of those videos, but there won't be one place to get every comment on the video. This must be a regular condition in the Fediverse, though.

Stages

MVP - Cantina Only

Don't target interop with Mastodon (though we can't stop them from trying and getting a degraded experience).

We need the lbry:Video et al objects because commenters need something to comment on. But we don't need any fields other than claimId in lbry:Channel and lbry:Video objects because Cantina knows how to get that info from the hubs. Thus there is nothing to update, thus we will not target Update events for now. Claims can however be released, so we should have Delete activities.

Next - Other Activities

Next - Interop with Mastodon

Requires a couple sizable hoops:

Odysee is the obvious choice for a media host, but that's the problem. We don't want to centralize. Anyway, perhaps we could have an instance-default, and let it be overridden per channel, probably as a setting on Cantina (we'll probably need settings on Cantina for this, actually).

orblivion commented 2 years ago

cc @lyoshenka