lbryio / cantina

2 stars 0 forks source link

IDs #3

Open orblivion opened 1 year ago

orblivion commented 1 year ago

This is what ActivityPub wants from an ID: https://www.w3.org/TR/activitypub/#obj-id

1) Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs). 2) An ID explicitly specified as the JSON null object, which implies an anonymous object (a part of its parent context)

If we go with lbry:// ID, would that satisfy the first option? I would guess not, due to the reference to an "originating server". But, we may do it anyway and choose to not actually implement the ActivityPub spec. But I have an alternate idea as well. I'll lay out both options, as well as a third one that won't work, but may be interesting enough to be worth mentioning.

Option 1: Throw out ActivityPub spec; "id" can be claimId

Repurpose the "id" field. Put in lbry://claimId for streams and channels, https://url for comments etc. We'd make it a lbry:// URL to cover the "publicly dereferencable URI" requirement, but I think we still miss the "originating server" part:

{
  "@context": "https://www.w3.org/ns/activitystreams",

  ...
  "id": "lbry://abcd123",
  ...
}

Interoperability with other ActivityPub services require a PR on their projects. (We might also consider making a library for them and other services to use, to avoid replicating work.)

This would fully commit us to not having web URLs, and we would only have one ID to think about. However other services would probably never work without custom support for the LBRY network. With this path, we commit to requiring Mastodon, Peertube, etc to implement LBRY support. We can't decide to give them a normal ActivityPub interface later because we already repurposed the "id" field.

But again I think this breaks the spec, and IMHO convincing them to accept a break in the spec seems very unlikely, even putting aside their feelings about custom-supporting any service.

Option 2: Bypass ActivityPub spec: Separate "id" and "lbry:id"

Keep "id" as a web URL as per the ActivityPub spec. Have a separate "lbry:id" which holds lbry://claimId for streams and channels, https://url for comments etc. This field will be defined in a separate json-ld @context. In this approach, LBRY is hanging out on the sidelines of ActivityPub. We get the claimIds as we need them, and I don't think we violate the spec. (Maybe we could run afoul of json-ld per se, if we have a situation where a full object optionaly collapses into a lbry:// link, whereas I would guess json-ld only allows that for https:// links [though I'm just guessing]. But, it seems less likely that we'll need to expand lbry:// links in the first place, since the Cantina server will usually be requesting those objects from the LBRY network anyway.)

This approach comes at the expense of a little more verbosity as we'll see.

An object based on a claim will have the following:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "lbry": "http://lbry.example/activitystreams-extensions/v1#"
    }
  ],

  ...
  "lbry:id": "lbry://abcd123",
  "id": "https://cantina.example/stream/abcd123",
  ...
}

Cantina servers can ignore "id" for many if not all cases, and get all necessary data from the LBRY network using "lbry:id". Similarly, fields such as "replyTo" or "attributedTo" would have a "lbry:*" counterpart that could point to lbry:// IDs for replying to videos or http:// IDs for replying to comments:

{
  "@context": "http://lbry.example/activitystreams-extensions/v1#",
  "id": "https://cantina-alpha.example/comment/efab123/2134728",
  "lbry:id": "https://cantina-alpha.example/comment/efab123/2134728",
  "attributedTo": "https://cantina-alpha.example/channel/efab123",
  "lbry:attributedTo": "lbry://efab123",
  "inReplyTo": "https://cantina-alpha.example/stream/dcba456",
  "lbry:inReplyTo": "lbry://dcba456",
  "type": ["Note", "lbry:Comment"]
}

This gets a little bit messy, but still managable, with follower lists. ActivityPub follower lists are a (at least usually) paginated list that looks like this:

{
  "@context": {"lbry:" "https://www.w3.org/ns/activitystreams"},
  "id": "https://cantina-alpha.example/channel/abcd123/followers?page=1",
  "next": "https://cantina-alpha.example/channel/abcd123/followers?page=2",
  "orderedItems": [
    "https://cantina-alpha.example/channel/abcd456",
    "https://cantina-beta.example/channel/abcd789",
    "https://fosstodon.example/channel/danarel",
    "https://mastodon.sdf.example/channel/taviso"
  ],
  "partOf": "https://cantina-alpha.example/channel/abcd123/followers",
  "totalItems": 310,
  "type": "OrderedCollectionPage"
}

The above is a combination of Mastodon and Cantina followers, which a Mastodon server could make use of when spidering the ActivityPub network. We can't have the "lbry:id" fields alongside the "id" fields. We need a separate follower collection object that meets our needs:

{
  "@context": "http://lbry.example/activitystreams-extensions/v1#",
  "id": "https://cantina-alpha.example/channel/abcd123/followers/lbry?page=1",
  "next": "https://cantina-alpha.example/channel/abcd123/followers/lbry?page=2",
  "orderedItems": [
    "lbry://abcd456",
    "lbry://abcd789",
    "https://fosstodon.example/channel/danarel",
    "https://mastodon.sdf.example/channel/taviso"
  ],
  "partOf": "https://cantina-alpha.example/channel/abcd123/followers/lbry",
  "totalItems": 310,
  "type": "OrderedCollectionPage"
}

It may seem like it bloats the payloads, but a given client will probably only want one version or the other. The follower/following lists are usually collapsed as links in the Actor objects, thanks to json-ld. So we'd just add a new field to the actor object for each list:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "lbry": "http://lbry.example/activitystreams-extensions/v1#"
    }
  ],

  ...
  "id": "https://cantina.example/users/abcd123",
  "lbry:id": "lbry://abcd123",
  "type": "Person",
  "following": "https://cantina.example/users/abcd123/following",
  "followers": "https://cantina.example/users/abcd123/followers",
  "lbry:following": "https://cantina.example/users/abcd123/following/lbry",
  "lbry:followers": "https://cantina.example/users/abcd123/followers/lbry",
  "inbox": "https://cantina.example/users/abcd123/inbox",
  "outbox": "https://cantina.example/users/abcd123/outbox",
  ...
}

We may want to do the same thing to inbox and outbox, but it may not be necessary. Looking at Mastodon it looks like they include full objects rather than links. If follow suit, we won't have the same compatibility problem with ActivityPub, since each object can have both types of ID (whereas for links we have to choose one or the other). Doing full objects for following/follower collections on the other hand could get pretty large.

There will probably be others cases that we need to have two different types of IDs, but hopefully it'll be as straightforward as this.

Simpler alternative, if it works

I'm not sure in which cases you need the full object vs a partial object. Looking at the docs for the attributedTo field it seems like you can get away with just id and type.

If this is generally true, we could probably simplify the above: Instead of creating a new lbry:inReplyTo field, have the existing inReplyTo point to an incomplete object with both kinds of ID:

  "inReplyTo": {
    "id": "https://cantina.example/stream/abcd123",
    "lbry:id", "lbry://abcd123",
    "type": "Video"
  }

And instead of having two followers collections, we'd do the same thing: have a list of incomplete objects that hold both kinds of id:

{
  "@context": "http://lbry.example/activitystreams-extensions/v1#",
  "id": "https://cantina-alpha.example/channel/abcd123/followers?page=1",
  "next": "https://cantina-alpha.example/channel/abcd123/followers?page=2",
  "orderedItems": [
    {
      "lbry:id": "lbry://abcd456",
      "id": "https://cantina-alpha.example/channel/abcd456",
      "type": "Person"
    }, {
      "lbry:id": "lbry://abcd789",
      "id": "https://cantina-alpha.example/channel/abcd789",
      "type": "Person"
    }, {
      "lbry:id": "https://fosstodon.example/channel/danarel",
      "id": "https://fosstodon.example/channel/danarel",
      "type": "Person"
    }, {
      "lbry:id": "https://fosstodon.example/channel/taviso",
      "id": "https://fosstodon.example/channel/taviso",
      "type": "Person"
    }
  ],
  "partOf": "https://cantina-alpha.example/channel/abcd123/followers",
  "totalItems": 310,
  "type": "OrderedCollectionPage"
}

But again this depends on whether partial objects are acceptable in this way. I can't find anything explicit about it one way or another.

MVP - No interop

With all this in mind, we can omit "id" and only have "lbry:id", at least on day one. Likewise "followers" and "inReplyTo" fields, etc. It would be just like ActivityPub-spec-breaking approach, but with different field names. Even for comments, we could use "lbry:id", since it can hold both kinds of url.

What's the advantage? It will leave our options open. We could leave it as-is, or later, if we decide to be friendly to Mastodon, we can add the standard ActivityPub fields. If we don't care about interop, well, we were prepared to break the ActivityPub spec anyway, so we could just as well use different field names.

Downsides:

Failed option: claimId parsed from enforced URL scheme

Another idea crossed my mind, which I ultimately decided would not work for reasons I decribe below. I thought it would be worth laying it out for completeness.

We could use a normal HTTPS url that includes the claimId, and fellow Cantina servers can just parse the claimId out of it:

{
  "@context": "https://www.w3.org/ns/activitystreams",

  ...
  "id": "https://cantina.example/stream/abcd123",
  ...
}

Fatal flaw: The Cantina hosting the channel with the followers may have an internal representation of each listed follower, separating Mastodon users from Cantina users with validated claim IDs. However a remote Cantina server requesting this follower list will just see the URLs. Some will be formatted as if it's a Cantina URL, but those may be an imposter Cantina, trying to make it look like someone is following someone else. To prevent this, the requesting Cantina would have to check each parsed out claimid with the LBRY network separately and see if they're actually connected to the same domain as the given ID, which would imply that the host Cantina validated it. But then, maybe the user changed domains in the interim. Etc etc. Basically, dealing with this gets way more messy than it's worth.

And if the response to this is "in that case, don't interop", then we might as well go back to "lbry://" ids as described above.

orblivion commented 1 year ago

cc @lyoshenka Our last conversation got me thinking about IDs. This was originally going to be part of the Grammar issue but this became pivotal and interesting enough to pull out on its own.

orblivion commented 1 year ago

Just added the "Simpler alternative, if it works" section.