lbryio / lbry-sdk

The LBRY SDK for building decentralized, censorship resistant, monetized, digital content apps.
https://lbry.com
MIT License
7.19k stars 483 forks source link

`claim_search` Plugin API #2855

Open eukreign opened 4 years ago

eukreign commented 4 years ago

In order to improve the content discovery and search capabilities of the LBRY network the SDK needs a way to use 3rd party search services.

Version 1 of the proposed API will be JSON RPC to aid in development and debugging, once the design has proven itself in practice and the data structures exchanged have solidified we may consider switching to a binary protocol such as protobufs.

SDK will require two end points exposed by a search service in order to be compatible; they are described below:

search_features

This end point will only be called once on SDK startup to find out the features offered by the search service. Expected response string values with brackets are meant to be filled in:

Request: Plain GET with no arguments. Response:

{
  "id": "[short name: 'lighthouse']",
  "version": "[version: '1.0']",
  "name": "[friendly label: 'LBRY Lighthouse Search']",
  "configuration": {  # any configs affecting search results users should know about
    "[config field]": "[config value]",
  },
  "filter": {  # the filter arguments which can be passed to the search end point
    "[field name]": {
      "type": "[field data type: string, integer, date, etc]",
      "constraints": ["comparison", "fts", "range", "etc"],
  },
  "order_by": {  # the order_by arguments which can be passed to the search end point
      "type": "[field data type: string, integer, etc]"
  },
  "metadata": {  # extra metadata returned by search results for every single claim
      "type": "[field data type: string, integer, etc]",
  }
}

search

This end point performs the actual search. It accepts a request with filters, order_by and pagination parameters (limit and offset) and responds with claims_ids and any extra metadata.

Request:

{
  "filter": {
    "[field name]": "[value]",
  },
  "order_by": [["[field1]", "desc"], ["[field2]", "asc"]],
  "offset": 0,
  "limit": 20
}

Response:

[  # claim_id is the only required value to be in the result
  {"claim_id": "[claim_id]", "[metadata field 1]": "[metadata value 1]", ...}
]

Work Flow

  1. On startup SDK will call search_features and cache the result for the duration of the running process.
  2. As clients connect to the SDK, the SDK will respond with the available search services and their features and configs as was reported by search_features.
  3. As SDK receives claim_search requests from clients it will validate that the filter fields and order_by fields are accepted by the search service.
  4. SDK forwards a clients search request to the search service passing appropriate limit/offset.
  5. For each claim returned by search service the SDK will check the claim against the block/filter lists and if any of the claims are blocked it will increase the offset value and send another search request, it will continue to request search and verify result against block/filter lists until either search service no longer returns claims or until the page_size requested by client has been filled up.
  6. As part of verifying results against block/filter lists and preparing the response SDK will look up each claim_id in it's own local database to get the latest txid:nout for the claim and all other metadata needed to return a consistent response regardless of which search service was used.
kauffj commented 4 years ago

Looks roughly good to me. Some comments/thoughts:

  1. It is metadata intended only for extra data? For example, a search score or something like this? I think this is meant this way, but confirming. It may make sense to label this xxx_metadata rather than just metadata since unlabeled metadata is frequently assumed to be claim metadata.
  2. Can methods be called just claim_search and claim_search_features rather than new top-level naming? And with some sort of design that supports somewhat seamlessly using the plugin when possible, otherwise falling back? Ideally, if lbry-desktop is updated to use claim_search always, it should work regardless of whether connected to a wallet server with a search plugin or not. It's okay if search works less well, but we'd like to find a design where this doesn't break.
eukreign commented 4 years ago
  1. I updated the write up with explanation of the feature fields. But here is further explanation: metadata is specific to the search service, for example, if we move trending out of the SDK and into a search service then the search service may want to provide some extra information about the trending of a particular claim, or if full text search was used there may be some relevancy decimal for each of the result rows, etc. The SDK will simply just pass this down to the client, so search service can put whatever it wants in there. This field is totally optional and the search service may opt to have no extra metadata (therefore its results will just be claim_ids).
  2. The APIs and even protocol is completely different. Currently the SDK wallet servers expose claim_search as a binary protobuf protocol which returns txid:nouts (there aren't even claim_ids in that response). I think it's already confusing as it is with SDK having two completely different incompatible claim_search APIs (one local used by app and one on wallet server used by client SDK), I'm not sure having a 3rd API end point named the same will help things. But having said that I don't feel too strongly about the name, if there is broader consensus to name the RPC function in lighthouse claim_search and claim_search_features i'd be happy to update this issue.