lbryio / proposals

Discussion of large projects
1 stars 0 forks source link

Tagging and Data Overlays #1

Open kauffj opened 6 years ago

kauffj commented 6 years ago

Problem Statement

There are times when the community would like to attach additional information to claims. Some examples:

This problem can be generalized into the (admittedly ill-defined) questions of:

Requirements

Owner

TBD (Jeremy for now)

eukreign commented 6 years ago

Brownie (or beer) points if the solution is generic enough to also support comments?

tiger5226 commented 6 years ago

I am super invested in this idea and am a huge fan of a decentralized mechanism for handling metadata. I think calling it metadata is important because we do not know what the use cases CAN be in the future.

The idea is that we have a field on the blockchain similar to the value stored in lbrycrd. However, this value, let's call it the meta field which is an array of byte arrays is not owned by the publisher but the community meaning whoever is willing to put out the transaction for the field to enter meta data to a claim.

Solution

LbryCRD

So this so far is a pretty familiar solution that we also use for publisher controlled data for a claim. Keeping it as an array is important because we want different transactions to be able to adjust it. I propose that we make this an append only array but my example also provides a mechanism via a signature so updates can be supported. So new transactions append an array of bytes to the end of the array. This meta field then becomes an array of individual user submitted metadata. This is important because when processing the array someone may facetiously submit bad data or ill-formed data. When this happens we do not want the meta field to be corrupted. We want to be able to "throw away an element" during handling.

For the transaction, this can be a new type like Claim_Meta and would just need the claimid and the meta binary. There are many ways this could leveraged but keeping it in the claimtrie is one of them as a second value that can be modified by "other than publisher".

So how do we leverage this field for comments, tags, categories, likes etc?

Lbry Protocol

Like publisher owned claim data, we apply a structure to this binary data and it is enforced by the LBRY protocol. Currently we use a json structure of data managed via protobuf. This too can be used to extend a community controlled meta field.

Structure

Collecting variable or unstructured data is a very common problem nowadays. To handle this data it is common to use NoSQL. The crux of many of these solutions is to handle data as a document instead of a set of structured tables. I would recommend that we use this json document structure to handle our meta data. There would be a set of document types enforced by the protocol. A transaction for lbrycrd would be to add (possibly edit) an element to the meta field.

Structure - Sample

This example is a user submitting a metadata element for a specific claim. They have added a comment to the content, shared a like with properties, and added tags and a category for the content. In addition to this they have submitted an update to the claim description and title to fix grammar. Seeing another users edits they upvoted the edit, and down voted a comment from another user to be removed for community violation.

{
  "Signature": "5e52fee47e6b070565f74372468cdc699de89107",
  "metadata": [
    {
      "MetaId": "Comment",
      "comment": {
        "userName": "tiger5526",
        "message": "this is an excellent video...highly recomomend"
      }
    },
    {
      "MetaId": "Like",
      "like": {
        "islike": "true",
        "emoji": "U+1F602",
        "size": "large"
      }
    },
    {
      "MetaId": "Tag",
      "Tags": [
        "baby",
        "eating",
        "funny"
      ]
    },
    {
      "MetaId": "Category",
      "Categories": [
        "Funny",
        "Babies"
      ]
    },
    {
      "MetaId": "ClaimMod",
      "claimmod": {
        "Description": "Aubrey tries lemon for the first time!",
        "Title": "Abrey & lemon"
      }
    },
    {
      "MetaId": "MetaVote",
      "vote": [
        {
          "Vote": "Up",
          "Hash": "3ee246c42ed3247dc49eed3f6969032e0d9a1d59"
        },
        {
          "Vote": "Down",
          "Hash": "d4ddb31a22b9db7c4b1623aebc8a01097decc002"
        }
      ]
    }
  ]
}

LBRYNet

How does it work? Well the daemon would query lbrycrd for the metadata set for a claim. Using a document structure like shown above, we can then validate the data like we do for claims according to the protocol specification and collate the information by MetaId.

How do we handle an update for example? Well the signature identifies the permission or ownership of a metadata element. If a signature can be validated the transaction can update the metadata element. If there already exists an element with that signature an update is performed. This is just to show as an example how we could allow updates, preference would be to make it append only.

Thinking of things generically, we might want the community to be able to have some level of control over the meta elements presented in a decentralized way. So we can easily have a MetaId for removal that contains the hash of the element that the community would like removed. If enough of these removals for an element are collected from the meta set then the daemon removes or ignores that element when processing as part of the protocol definition.

Comments

This becomes handling logic for the daemon now. Comments are easy, we show all the comments in the app in a nice way. They can have special requirements, like the community might want to remove a comment for some reason. The generic handling of element sets applies here and comments are ignored but never removed. See community edits description below.

Likes

Similarly, likes can be collated by MetaId and in the case of the app the sum can be used. We can even show emojis with counts like slack for example.

Tags

Tags are a great feature. The more common a tag for a claim the more it shows up or more prominent it is. This means we can also apply dynamic weights for search if we wanted to leverage them that way as a community point of reference rather than SEO from publishers. This is along the lines of PageRank where it is not just about the content it is about the importance of the content. Instead of backlinks we leverage community tagging of content.

Categories

This is similar to tags however, this can more closely controlled and be more global. Say for example we want to only include categories that globally are referenced by more than 10,000 claims. This can now be done and part of the protocol.

Publisher Controls

The publisher can have weight over his claim. So for example if a community member submits a claim mod to change something, the publisher can submit a meta element that up votes the mod. This upvote based on the signature verification match of the claim holds much more weight. Thereby allowing the publisher to approve/reject an edit, while still allowing the community to collectively overrule the publisher. Alternatively the publisher can admit defeat and just update the claim to match the edit.

Playlists

This is not meta data of a claim or channel. This more closely aligns with content and should have an exclusive owner. It is in fact user generated content. Right now we have certificate for Channels, stream for Content and nowclaimlist for Playlists. Claims are for user owned content. Meta data can also be global (unrelated to a specific claim) but I don't see use cases right now beyond reputation. Maybe their are other examples?

Community Edits

This is also an important topic. Since we are dealing with a dynamic document structure for these elements we can leverage a MetaId for things like ignoring meta elements or as in this case, editing the publisher claims. In the above example there is an upvote. This upvote is for a meta element. The hash is the hash of the meta element itself. If enough upvotes are collected then LBRYNet returns the field that is modified by the ClaimMod in the meta element and shows that instead of what the publisher stated. Below are use cases.

Claim Modification(NSFW, Virus etc.)

I am not certain what those weights are. If a ClaimMod exists for the claim in the meta elements of lbrycrd we must determine whether or not that ClaimMod is allowed to replace a field of a stream claim. Since we know we do not want one 1 person to have the ability to change a claim unilaterally it must be a collective decision. Weights can be difficult here because there are differing scales. So if a claim is popular and gets 100 comments per day, then we don't want 10 people being able to control the claim metadata intentionally owned by the publisher. Alternatively, for a low volume claim containing a virus we don't want to wait for 10 people to agree because that may never happen.

The middle ground is a public comment period. So if we have potential shared ownership conflicts like a content publisher and content consumer, we need a way to mitigate this conflict. There is also the idea of reputation. The higher the reputation of the editor, the shorter the comment period. This is probably a performance nightmare tracking the reputation so I would shy away from that. The comment period can be combined with a weight. So the publisher has more weight for claims that they own as far as ModVotes go. Additionally, a content consumer can leverage lbry credits against a claim mod if they feel so strongly about it. This requires certificated meta elements. So for example if I really have a problem with a claim I should able to out weight the publisher with lbry credits if needed. The formula needs to be defined, but let's say a publisher has a weight of 10, and lbry credits have a weight of 1. So if I want to overrule the publisher, I can just edit with a ClaimMod. The ClaimMod has as default weight of 1. Depending on the age of the claim (we can use the max 7 days for consistency) the comment period prevents it from taking effect. If no one posts a conflicting ClaimMod, it is accepted and overrides ( described right under community edits ) the claim metadata. Now say I own this claim, and I disagree. I have the 7 days to reject it. As the content publisher a MetaVote holds a weight of 10x. So it handily wins. Now the community can upvote the ClaimMod via a MetaVote each worth 1. Additionally, someone could put LBRY Credits behind the vote amplifying the impact. This makes overriding the published require 10 people or 1 person with 10 LBRY Credits. If the publisher wants to defend their claim, they have a 10x advantage. Similar to a support this can be spent, and it would no longer count. So meta elements can produce spendable outputs or not. This now requires finding the meta element so an array might no longer work well.

For NSFW the community can override what is show in the app by pulling together. Now that you see the previous use cases something like a safe file warning becomes another ClaimMod that can be committed via a meta element again and up/down voted with MetaVote or not with the comment period mentioned.

Lbry App

Now that we have all this raw data from lbrycrd, logic in the daemon and structure defined by the protocol, the app can now appropriately show this information in a decentralized and "pretty" way. The interaction with Lbrycrd is abstracted away with this solution and everything appears seamless which is ideal. Since lbry credits are required for transactions we probably want to allow an edit mode, where they can collect as many adjustments for a meta element together as possible and hitting a save button to submit the transaction. See costs section.

Generic Solution

So I hope I have show how something like this can be easily extended by the daemon, enforced by the protocol, stored in lbrycrd and displayed in the app. A mechanism like this, takes care of decentralization with community control without impacting the rights of the publisher owning the claim and its updates. Also this is something that can be completed in increments which means we can delivery part of the solution in a faster manner and independent of our stack. Meaning we can implement the meta field, adjust the RPC calls, then handling the MetaId's that we would like the protocol to enforce can be handled one by one and deployed incrementally. This also allows all future updates to meta usage to be offchain.

Performance

How we implement is important always. One contentious point would be the updates. If you stored this in an array, finding the element to update is a performance penalty for claims with lots of metadata unless we used a map and could then have the signature as the key. Consider a very controversial claim, similar to a very popular wiki page. Or even as the volume of meta elements increases this could be a processing bottleneck for the daemon depending on the protocol for individual MetaIds. I would think this is minimal since it is on a claim by claim basis. I don't picture having so many meta elements that this surfaces. However, I could picture spamming. We could potentially implement a scalar that grows the cost of appending an element based on the current size. So the 100th element costs 10 times as much as the 10th element. This should keep spam down and costly. LbryNet processing by the daemon, should be minimal since this is user by user and not being done for thousands of users at once by the same instance. Lastly, Instead of having an array in the meta element json document we could also just have objects directly mapped with the MetaId as the key.

Costs

I am a fan of the support mechanism with LbryCrd. The cost of each transaction is minimal and prevents spam, however, maybe the community doesn't want to spend money. I am not a particular fan of this because it may hinder user adoption of the features described above, but maybe a meta element costs X to post, and stays relevant unless those credits are spent similar to how a claim can be spent. This also improves the guarding against spam and only keeps information in the set while the user deems the meta element as contributing to the content.

lyoshenka commented 6 years ago

I added playlists (e.g. "my workout playlist") to the list of example use cases. It's different because its a collection that I (as a user) should be able to retain final authority over (or should I?). That may be outside the scope of this proposal because its not connected to questions of mixed ownership/authority, but I think it's related.

BrannonKing commented 5 years ago

In my mind tags and comments are not very similar: tags are intended to be upfront and stable. Comments are intended to accumulate over time. Tags come from the publisher; comments come from the consumer. Publishers do not want to have their content polluted by consumers. Tags, comments, ratings, and playlists should each have their own proposal.

lyoshenka commented 5 years ago

maybe a Token-Curated Registry pattern could work? you can think of each tag as being a registry of content described by the tag. A TCR lets the community vote on what content belongs on the list. If you apply a tag to your content and it gets voted out, you lose LBC. If you vote for tags and the majority agrees with you, you earn LBC. So you're incentivized to curate and to curate accurately (where accurately = with the majority). Not a complete solution, but the start of an idea.

ref: http://tokenengineering.net/tcr

tiger5226 commented 5 years ago

This is really cool @lyoshenka! I like this a lot for a community judging mechanism.

kauffj commented 5 years ago

That's a good link @lyoshenka, thanks for sharing.

My initial reaction is that the TCR pattern is promising if we decided that firm categorization is a requisite. That is, a tag either does or does not apply to piece of content for all users.

If firm categorization is not a requirement, then I would wonder if we lose information by requiring firmness. Content that narrowly passes a challenge is less likely to be as strong of a candidate for that tag as a piece of content that passes it's challenge indisputably. Similar for content that narrowly fails.