ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Add more examples to Selector specs #327

Open warpfork opened 3 years ago

warpfork commented 3 years ago

As @chafey has pointed out lately, the selectors specs could use more examples to make them easier to understand, and also give implementer more fixtures to work with.

We have some examples already: https://github.com/ipld/specs/blob/109d2f82f62787fef3754675764290a1b027b467/selectors/example-selectors.md

... however, they're very narrative (fine, but means it was high cost for us to write) and there's relatively few of them, and they cover only quite basic cases. We need more!

@austinabell has already written some very excellent fixtures, in other repo. https://github.com/ChainSafe/ipld-traversal-vectors/blob/master/selector_walk.json has been very excellent reference material.

We should consider integrating that fixture data into our specs repo, or, at least linking to it.

warpfork commented 3 years ago

A couple of misc things I'd like to see added along with those fixtures:

warpfork commented 3 years ago

Quick draft on explainer of selector_walk.json's format :arrow_right:

This fixture file contains a list of fixture objects.

Each fixture object matches the IPLD Schema:

type Fixture struct {
    # "description" is free text describing the fixture.
    description String

    # "ipld" is a hunk of IPLD data that we'll apply a selector to.
    # This data can really be anything (single string; ragged recursive structure; etc).
    # (Since these fixtures are stored in json, it's really just more json
    # structure continuing under this map key in the fixture.)
    ipld Any

    # "selector" is the encoded selector which we'll parse and apply to the data.
    # (This matches the Selector schema, but here, let's skip details and call it "Any".)
    selector Any

    # "expect_visit" is a list of visit information, in order,
    # which we'll expect the selector to yield when applied to the data.
    expect_visit [VisitPoint]
} representation map

type VisitPoint struct {
    # "path" is the path from root of the data up to this VisitPoint.
    path Path

    # "node" is a node (or at least a shorthand of it) reached at this point in a visit.
    node VisitPointNode

    # "matched" is true if the selector had a matcher clause hit this node;
    # "matched" is false if the selector visited this node but didn't explicitly match on it.
    matched Bool
} representation map

type Path string # a slash-delimited path between the root of the data (which has a path of empty string) and a VisitPoint.

type VisitPointNode any # we'll come back to this in a moment.

(Note: the schema is used here only for documentation! You don't need to implement IPLD Schemas in order to implement Selectors; in fact, these fixtures were developed without schemas, and I'm applying them retroactively as an explainer tool.)

To make this concrete, let's look at some JSON that matches this:

{
    "description": "Walk matching simple node",
    "ipld": "basic test",
    "selector": {
        ".": {}
    },
    "expect_visit": [
        {
            "path": "",
            "node": {
                "string": "basic test"
            },
            "matched": true
        }
    ]
}

Now let's look at VisitPointNode a little more deeply.

VisitPointNode's job is to describe what node in the original data has been reached during the selector's walk. To do this, it says what kind the node is; and then -- only if it's a scalar -- it states the node's data. (If it's a recurive -- like a map or a list -- a null is just used instead, because we don't want to repeat that much data.)

In schema form:

type VisitPointNode union {
    | Bool        "bool"
    | Int         "int"
    | Float       "float"
    | String      "string"
    | Bytes       "bytes"
    | MapDummy    "map"
    | ListDummy   "list"
    | Link        "link"
} representation keyed

type MapDummy unit representation null
type ListDummy unit representation null

The expect_visit list is ordered, and exact. A selector should yield exactly this visit points, in exactly this order. (In other words: Selector visitation orders are deterministic, and you should test for this.)

austinabell commented 3 years ago
  • selector_walk.json needs just a little bit of explainer about which fields are what. (I grok this already; can draft this in a sec.)

ipld -> data in Json format being selected on selector -> selector to apply on the data expect_visit -> in order nodes visited applying the selector on the ipld

  • selector_explore.json... @austinabell may actually have to explain to me, I'm not sure I get that one :)

This is just testing the selector progression as it applies on the ipld. This is just testing the explore functionality when I was comparing against the go implementation (ie: https://github.com/ipld/go-ipld-prime/blob/08171d6bef3662ed7839e2c541f2c727d04e1c36/traversal/selector/exploreIndex.go#L23 but for each selector)

  • I'm gonna need a little help understanding the "cbor_ipld_storage" field in selector_walk_links.json as well.

These tests were more tailored to what I wanted to ensure worked for our client. Each item in that array is cbor encoded and put in a blockstore (and those are the links that are populated when traversing). Made most sense to me because this is the use case for most

Yeah sorry this isn't well documented, was on a time crunch, but hopefully the tests setup to be run by other implementations helps :) I just wanted to be able to ensure interoperability through setting this up

Explore runner is simple but the walking runner is here: https://github.com/ChainSafe/forest/blob/main/ipld/tests/walk_tests.rs