w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.24k stars 78 forks source link

Community notes #464

Open evanp opened 2 months ago

evanp commented 2 months ago

A mechanism to attach disinformation or other warnings to controversial posts.

trwnh commented 2 months ago

prior art: https://www.w3.org/TR/annotation-protocol/

nightpool commented 2 months ago

@trwnh I took a look through the Web Annotation protocol, since I assumed it would have something related to e.g. looking up annotations by target—I would expect that the "community notes" feature would take the form of a separate "community notes service" or server or actor that would allow users to look up notes for any post, without the post itself having to reference them (since we can't rely on the person providing misinformation to cooperate in publishing the note).

But I couldn't find anything in the Web Annotation protocol that addressed the discovery of annotations without relying on the target being annotated to provide the IRI. Maybe I missed something? But I can't see how this would help. I also didn't see anything especially relevant in the data model, except for the at the highest level of abstraction of "this Note has a target" which doesn't feel super useful to reuse.

trwnh commented 2 months ago

@nightpool i think that's a reasonable expectation to have. i don't understand the second paragraph, though -- you don't need to rely on the target being annotated to provide anything, in the same way that many current fediverse implementations don't rely on the contents of the likes or shares or replies collections to present likes or shares or replies to local users. arguably those cases should rely on the contents of those collections, but as you point out in this particular case of misinfo or disinfo, you are not guaranteed to have a linkback to any annotation. i don't think that's a problem, though -- it's "enough" to discover the annotations via a dedicated annotation service that supports querying annotations by target.

and this is precisely what the "web annotation protocol" describes. you have Annotation Servers and Annotation Clients which can run parallel to ActivityPub Servers and ActivityPub Clients. an HTTP resource can optionally declare a preferred annotation service, but it probably makes more sense to allow fediverse service providers to negotiate preferred annotation services instead.

ThisIsMissEm commented 2 months ago

I think there might be room here for there to be the ability to distribute Web Annotations via ActivityPub (Create(Annotation)) and to be able to "follow" an annotation server and have new annotations pushed to you; which would be kind of combining or unifying the two specifications.

nightpool commented 2 months ago

it's "enough" to discover the annotations via a dedicated annotation service that supports querying annotations by target.

Yes, I agree that would be enough, if it existed, I just literally couldn't find the spec language in the Web Annotation protocol that would provide for it. Maybe I'm failing to understand the spec though!

trwnh commented 2 months ago

I just literally couldn't find the spec language in the Web Annotation protocol that would provide for it. Maybe I'm failing to understand the spec though!

https://www.w3.org/TR/annotation-protocol/#annotation-pages

in general an annotation server/service provides an LDP Container / AS2 Collection that can be paged through. so you could fetch/page/cache the collection then query it locally, or otherwise this is tangentially related to the myriad other issues recently where being able to query/filter Collections would solve a slew of issues of the same class

the publishing of Create Annotation exposed via an activitypub actor is also a good idea though, if you want to receive notifications of every single annotation. i think there is also value in being able to query for annotations against a specific target, though -- something something SPARQL endpoint

hadilq commented 2 months ago

I like the ActivePub protocol, and the people around it, as you may have guessed! Also I like the question, and the answers, especially the annotation servers, which I think is the way to implement it. Unfortunately, I am not familiar with the annotation servers' protocol, and in my mind, there's no way you could do it properly without signing the users interactions, like, reshare, misinf, disinf, etc. with private keys. I understand that attaching public/private keys to users' account and signing their interactions will cost a lot in terms of performance, implementation, and running the network, which I think why the community avoided that route, but it's inevitable. Also those keys could pave the way for proper end-to-end messaging. I hope the community considers that it's necessary at some point.

nightpool commented 2 months ago

n general an annotation server/service provides an LDP Container / AS2 Collection that can be paged through. so you could fetch/page/cache the collection then query it locally

Right, so as I understand it the only way to get a community note for a given post is to download all community notes for the entire network. I don't think that's fit for purpose it just feels like a very inflexible / unscalable way to build the system. Reminds me exactly of relays and how bad of an idea those were.

I don't think Create CommunityNote is a good path forward either for the same reasons. We need a service that can take in an AS2 id as an input and return a (list of) community notes as an output.

trwnh commented 2 months ago

it just feels like a very inflexible / unscalable way to build the system

it's just pulling from web architecture. in the same way you can't query the entire web (and why indexers exist), you will have to at present index the container/collection. but this is no different than paging through an outbox collection of Create Annotation activities.

We need a service that can take in an AS2 id as an input and return a (list of) community notes as an output.

exactly. at the most generic level (and this sure has come up a lot lately!), we can use SPARQL (or some other query language, but SPARQL is the best fitting one for this purpose). this can also be accomplished via simpler means with something like #462 that is a very limited-purpose endpoint working exactly as you describe. this is currently up to the implementation, and so we need at least a standardized interface. something like GET /annotationContainer/?target=something would probably be sufficient. maybe with one layer of indirection.

LDP Containers don't really have a mechanism for querying through them because the expectation is that you are going to use SPARQL, as that is the W3C standard for querying linked data and RDF (and also, specs should be orthogonal to each other, there is a separation of concerns between putting things into containers vs. pulling things out of that container). AS2 Collections have the same issue, but no recommended solution -- there is no equivalent statement anywhere in the specs or made by the WG/CG to the effect of "use SPARQL". Web Annotations likewise doesn't seem to explicitly declare a preferred mechanism, but since Annotation Containers inherit from LDP Containers (and AS2 Collections as well), there is an implied "use SPARQL" buried in there. And honestly, there aren't really that many differences between Containers and Collections, so you can somewhat naturally unite the two models. indeed, even AS2 describes Collections as such:

Collection objects are a specialization of the base Object that serve as a container for other Objects or Links.

emphasis mine.

worth noting that LDP has its own paging mechanism: https://www.w3.org/TR/ldp-paging/#ldpc-general

dudleyinnocent commented 1 month ago

I don't think Create CommunityNote is a good path forward either for the same reasons. We need a service that can take in an AS2 id as an input and return a (list of) community notes as an output.

30,000 foot idea: Create a Community Note-like node system that users may subscribe to. Give the administrator and user alike an option to select their Community Note servicing node. Said servicing node holds a note hash table and furnishes relevant information.

ThisIsMissEm commented 1 month ago

@dudleyinnocent that'd be what modelling off of the Web Annotations protocol would be.

nightpool commented 1 month ago

it's just pulling from web architecture. in the same way you can't query the entire web (and why indexers exist), you will have to at present index the container/collection. but this is no different than paging through an outbox collection of Create Annotation activities.

Right, and I don't think that solves the use-case described in this issue. To solve the user problem here we need more protocol than Web Annotation is giving us. And that's why I said an outbox of Create Annotation activities was also a bad idea, without anything extra on top of it

LDP Containers don't really have a mechanism for querying through them because the expectation is that you are going to use SPARQL, as that is the W3C standard for querying linked data and RDF

Okay, so say "SPARQL solves this problem" instead of "Web Annotations solves this problem". Web Annotations, as a spec/protocol, is providing no additional value here for this use-case except the concept of a "target" (which ActivityStreams already has) and a "page" (which ActivityPub already has, in the form of outbox).

And the spec doesn't even mention SPARQL or give any examples or normative guidance on how to integrate or implement with it, so I don't know why people are bringing it up when we're trying to answer the question of "How do we build community notes for the fediverse?". The web annotation spec doesn't solve ~any of our problems, it's just completely duplicative with the existing AS2/AP specs. And I think it goes without saying that most implementors would consider implementing SPARQL for their servers or clients an insane amount of overkill to solve this single problem of "what community notes do I show for this message"? (not even mentioning the sanitization and security issues SPARQL would bring with it).

nightpool commented 1 month ago

that'd be what modelling off of the Web Annotations protocol would be.

@ThisIsMissEm Where in the Annotation server protocol is a notion of "subscribing" to an Annotation server, or the idea of a annotation server "hold[ing] a note hash table" and "furnishing relevant information" (in response to a query for an IRI), as dudley describes?

People keep making claims about the Web Annotation protocol that AFAICT are completely unsupported by the spec.

trwnh commented 1 month ago

Okay, so say "SPARQL solves this problem" instead of "Web Annotations solves this problem".

that's not entirely accurate, though. what Web Annotations provides is a data model for annotating other Web resources, and a protocol for doing RESTful CRUD against LDP containers that support AS2 Collection model as well with paging. i wouldn't say this is "no additional value", because the alternative is reinventing some AS2 representation of an annotation. "how would you represent an annotation in AS2?" is a question that doesn't have an immediate or obvious answer. "how would you represent an annotation in Web Annotation?" is a question that has been answered already.

if we're trying to come up with an AS2 repr of an annotation, then as:target is not the same as oa:hasTarget. the "target" of an Annotation and the "target" of an Activity aren't the same. i'm sure you could come up with something that worked, but it would work less effectively than just reusing what already exists.

if we still want to go with some bespoke service that lets us query annotations in whatever AS2 format we invent, then we can do that. it doesn't have to be SPARQL, it could be as "simple" as just extending the Web Annotation Protocol to accept a new query parameter target pointing to the object id you want annotations against, and have the server perform the query internally however it wants to.

i just think that we shouldn't invent/reinvent a new format when we already have one. nor should we reinvent a protocol when this one mostly works. surely it's not too much of a leap to go from GET /annotations to GET /annotations?target=foo

dudleyinnocent commented 1 month ago

Looking to pull both sides together with this. I think there is merit in viewing it from this angle incorporating both perspective's ideas. Here is my understanding fleshed out by ChatGPT 4o from a hand typed outline and reviewed by me. I'm not authoritative, but looking to find holes with what is presented here.

Here are some simplified code examples demonstrating how one might extend the W3 Annotation Protocol to mimic features of Community Notes. For the sake of clarity, I'll use JSON-LD format, as it's commonly associated with the W3 Annotation Protocol.

1. User-Generated Annotations

{
  "@context": "http://www.w3.org/2016/annotation",
  "id": "https://example.com/annotation/1",
  "type": "Annotation",
  "body": [
    {
      "type": "TextualBody",
      "value": "This tweet may contain misleading information regarding the event.",
      "format": "text/plain",
      "creator": {
        "id": "https://example.com/user/123",
        "name": "User123"
      }
    }
  ],
  "target": {
    "source": "https://twitter.com/example/status/1234567890",
    "type": "Text",
    "label": "Example Tweet"
  },
  "created": "2024-09-23T12:00:00Z"
}

2. User Ratings

One can incorporate user ratings by adding a ratings array to the annotation:

{
  "id": "https://example.com/annotation/1",
  "ratings": [
    {
      "user": "https://example.com/user/456",
      "value": 5,
      "created": "2024-09-23T12:05:00Z"
    },
    {
      "user": "https://example.com/user/789",
      "value": 4,
      "created": "2024-09-23T12:10:00Z"
    }
  ]
}

3. Community Moderation

One might add a flags section to keep track of user-reported issues:

{
  "id": "https://example.com/annotation/1",
  "flags": [
    {
      "user": "https://example.com/user/101",
      "reason": "Inaccurate",
      "created": "2024-09-23T12:15:00Z"
    }
  ]
}

4. Annotation Versioning

To support versioning, one could maintain an array of versions:

{
  "id": "https://example.com/annotation/1",
  "versions": [
    {
      "version": 1,
      "body": {
        "value": "Initial annotation content.",
        "created": "2024-09-23T12:00:00Z"
      }
    },
    {
      "version": 2,
      "body": {
        "value": "Updated annotation with more context.",
        "created": "2024-09-23T12:30:00Z"
      }
    }
  ]
}

5. Enhanced Search Capabilities

Implementing a search feature, one could structure data to facilitate this:

{
  "search": {
    "query": "misleading information",
    "results": [
      {
        "id": "https://example.com/annotation/1",
        "snippet": "This tweet may contain misleading information regarding the event.",
        "creator": "User123"
      },
      {
        "id": "https://example.com/annotation/2",
        "snippet": "Check this fact before sharing.",
        "creator": "User456"
      }
    ]
  }
}

And not to forget the hash table concept... Here’s how one can implement this concept, along with a code example.

Concept of a Hash Table

A hash table allows one to store key-value pairs, where the key can be a unique identifier (like a hash of the post's content or its URL) and the value can be the associated annotation or metadata.

1. Hash Function

one can use a hash function to generate unique keys from the content or URL of each post. For simplicity, we can use a basic hashing algorithm, such as SHA-256, to create a hash from the post’s URL or content.

2. Data Structure

Here's a basic structure of how a hash table could be implemented in JavaScript:

class HashTable {
    constructor(size) {
        this.table = new Array(size);
    }

    hash(key) {
        let hash = 0;
        for (let char of key) {
            hash += char.charCodeAt(0);
        }
        return hash % this.table.length;
    }

    set(key, value) {
        const index = this.hash(key);
        if (!this.table[index]) {
            this.table[index] = [];
        }
        this.table[index].push({ key, value });
    }

    get(key) {
        const index = this.hash(key);
        if (!this.table[index]) return undefined;
        for (let entry of this.table[index]) {
            if (entry.key === key) return entry.value;
        }
        return undefined;
    }
}

3. Example Usage

Here’s how one might use the hash table to manage annotations for posts:

const postAnnotations = new HashTable(50);

// Example post data
const postUrl = "https://twitter.com/example/status/1234567890";
const annotation = {
    id: "https://example.com/annotation/1",
    body: "This tweet may contain misleading information.",
    creator: "User123",
    created: "2024-09-23T12:00:00Z"
};

// Generate a hash from the post URL
const postKey = postUrl; // In practice, you'd hash this

// Add an annotation to the hash table
postAnnotations.set(postKey, annotation);

// Retrieve the annotation
const retrievedAnnotation = postAnnotations.get(postKey);
console.log(retrievedAnnotation);

4. Hashing Function for URLs

For a more robust implementation, one can use a cryptographic hash function. Here's how one could do it using the built-in crypto module in Node.js:

const crypto = require('crypto');

function hashString(str) {
    return crypto.createHash('sha256').update(str).digest('hex');
}

// Example usage
const postKey = hashString(postUrl);
postAnnotations.set(postKey, annotation);

These examples provide a starting point for how to extend the W3 Annotation Protocol to include features inspired by Community Notes. Each piece of data adds layers of functionality, enhancing community engagement, moderation, and information validation. one can adapt and expand upon these examples based on your specific use case and requirements.

Again, please find gaps in the the concepts. View this code as pseudocode, rather copy and paste ready.

ThisIsMissEm commented 1 month ago

I should note that my response was curt because I was dealing with my heart playing up last night (I was in hospital later in the night).

I would encourage against implementing something completely divorced from an existing spec, that said, there is no reason we can't have actors publishing Create/Update activities for objects defined in the Web Annotations Data Model. I'm not arguing for SPARQL to be implemented anywhere.

However, before we get to that point we need to first look at the use cases and expectations for users and server operators. Protocols should not be designed abstract from end-user needs.