ucan-wg / delegation

1 stars 1 forks source link

selector syntax #5

Open Gozala opened 6 months ago

Gozala commented 6 months ago

Here is my best attempt to describe jq inspired selector syntax and deliberate incompatibilities with jq.

Supported Forms

Name Selector Input Output jq Output
Identity . {x:1} {x:1} {x:1}
Iterator .[] [1, 2] 1, 2 (1, 2)
Null Iterator .[] null ⛔️ ⛔️
Optional Null Iterator .[]? null () ()
Nested Iterator .[][] [[1], 2, [3]] ⛔️ (1, ⛔️)
Optional Iterator .[][]? [[1], 2, [3]] (1, 3) (1, 3)
Object Key .x { x: 1 } 1 1
Missing Key .x {} ⛔️ null
Optional Missing Key .x? {} null null
Null Key .x null ⛔️ null
Optional Null Key .x? null null null
Array Key .x [] ⛔️ ⛔️
Optional Array Key .x [] null null
Quoted Key .["x"] {x: 1} 1 1
Optional Quoted Key .["x"]? {} null null
.length .length [1, 2] ⛔️ ⛔️
.length? .length? [1, 2] null null
Index .[0] [1, 2] 1 1
Out of bound Index .[4] [0, 1] ⛔️ null
Optional Index .[4]? [0, 1] null null
Negative Index .[-1] [1, 2] 2 2
String Index .[0] "Hi" "H" ⛔️
Bytes Index .[0] Bytes([0, 1]) 0 n/a
Array Slice .[0:2] [0, 1, 2] [0, 1] [0, 1]
Array Slice .[1:] [0, 1, 2] [1, 2] [1, 2]
Array Slice .[:2] [0, 1, 2] [0, 1] [0, 1]
String Slice .[0:2] "hello" "he" "he"
Bytes Index .[1:] Bytes([0, 1, 2]) Bytes([1, 2]) n/a

We treat bytes as u8 array. We also treat links as u8 array from selectors point of view.

expede commented 6 months ago

Asking around, having this sugar...

[">", ".foo", 0]

...was universally loved ❤️

expede commented 6 months ago

Lifting some conversation from DMs here:

expede commented 6 months ago

I'd expect .foo[3;9] to produce a stream of values, not a smaller array

UPDATE: that's not what jq does.

{foo: [0,1,2,3,4,5]}
.foo[1:3] // ..., [1,2,3], ...
.foo[1:3][] // ...1, 2, 3, ...

The refrain in feedback in basically any community I've asked is "just do what jq does unless you have a good reason not to", so let's respect jq behaviour here 👍

Gozala commented 6 months ago

I have implemented selectors and put together this test vector so we could utilize it for compatibility https://github.com/web3-storage/ucanto/blob/7bd8f8ec1aad147a70635ec3c6c97f297a0e281d/packages/core/test/policy/selector.vector.js

Few things I'd like to call out

  1. Object iterator MUST sort keys alphabetically to ensure deterministic order both across languages and to be immune to insertion order in JS.
  2. I end up with an return type like
    type SelectionResult = 
      | { one: Data }
      | { many: Data }
      | { error: ParseError|TypeError }

    You get many whenever selector contains [] and you get one otherwise. You get ParseError if selector is invalid and TypeError in cases where jq produces errors.

  3. I think jq is very inconsistent in it's behavior, e.g. pathing into null like .foo produces null, but same on array or string produces an error. I wish it just always returned null if path was not valid or always returned error.
    • Iteration also has some strange edge cases that is inconsistent I'll paste some of those below
  4. .length on arrays is produces errors, that is what jq does and probably makes sense in non-js contexts. On the other hand if we allowed .length on arrays we could avoid having to introduce operator in it's place.
  5. jq does not allow indexed access on strings, I found it surprising, maybe we can allow character access.
  6. out of bound array access produces null.

I think it would be a good idea to not follow jq blindly and make things bit more consistent. E.g. always result in null if path does not lead to a value and [] when iteration is run over non iterables. Empty iterables turn into never / bottom on some and every which seem like what you'd expect. In case of null it is not ideal but not completely unreasonable. If we don't want to return null perhaps we could just return errors in cases like (null).foo instead.

Gozala commented 6 months ago

Some more WTF from jq land

echo '{}' | jq '.q[0]'
> null

echo '{"q": {"0":1} }' | jq '.q[0]'
jq: error (at <stdin>:1): Cannot index object with number

echo '{}' | jq '.[][][][]'

echo '{"q":{}}' | jq '.[][][][]'

echo '{"q":{"x":1}}' | jq '.[][][][]'
jq: error (at <stdin>:1): Cannot iterate over number (1)
expede commented 6 months ago

Some more WTF from jq land

LOL yeah I don't love their heavy use of null. I guess they're using null as undefined?

echo '{"q": {"0":1} }' | jq '.q[0]'

If you use quotes it works:

$ echo '{"q": {"0":1} }' | jq '.q["0"]'
1

Which is maybe a fine enough way to distinguish between array index vs maps? I'm not against that behaviour being explicit, though I would have expected it to returning the first alphabetical element of a collection on a map 🤷‍♀️

This one seems reasonable to me, and IIRC matches how JS behaves

expede commented 6 months ago

echo '{}' | jq '.[][][][]'

😆 wut

expede commented 6 months ago

E.g. always result in null if path does not lead to a value

Do you mean like this?

// {a: 1}
".b" == null

I guess that makes sense. We want to be able to say that somethnig is unset. Do we need to distinguish between these cases in validators?

query = ".a"
{a: null}
// null

query = ".a"
{}
// null
expede commented 6 months ago

Object iterator MUST sort keys alphabetically to ensure deterministic order both across languages and to be immune to insertion order in JS.

Agreed 👍

I end up with an return type like

Seems reasonable. From the chat in Discord, it probably needs to extend with some/every, but it's basically right yeah 👍

.length on arrays is produces errors

I think I see why: it looks identical to expecting a map key. {length: 0} is definitely a case that I'd like to support, and I woudl expect it to fail if run on a non-object

Gozala commented 6 months ago

Do you mean like this?

// {a: 1}
".b" == null

I don't like this but it is jq behavior

echo '{"a":1}' | jq '.b'  
null

I guess that makes sense. We want to be able to say that somethnig is unset. Do we need to distinguish between these cases in validators?

query = ".a"
{a: null}
// null

query = ".a"
{}
// null

These behave as follows in jq, I don't like it but it does

echo '{"a":null}' | jq '.a'
null
echo '{}' | jq '.a'        
null
Gozala commented 6 months ago

LOL yeah I don't love their heavy use of null. I guess they're using null as undefined?

JSON does not have undefined so it makes sense that they use null although there are bunch of cases where I'd expect error instead. Or they could also return null in cases where they produce errors today specifically it makes no sense to me that .a on null is null while on [] it is an error.

Gozala commented 6 months ago

This one seems reasonable to me, and IIRC matches how JS behaves

In JS it will cast 0 into "0" and return corresponding property, which may not be ideal, but I do still find surprising that jq error instead. It is also less, but still odd that .foo on {} is null but .[0] on {} is an error. I would expect both to be errors or both to produce null 🤷‍♂️

expede commented 6 months ago

How do you feel about this behaviour:

// {}
["not", ".foo"] // true

// {foo: null}
["not", ".foo"] // true

// {foo: false}
["not", ".foo"] // true

Which is a bit more like JS

expede commented 6 months ago

(specifically trying to find a way to predicate omission)

expede commented 6 months ago

still odd that .foo on {} is null but .[0] on {} is an error.

Yeah they fail on some (what I think they're treating as) type errors, but not others

Gozala commented 6 months ago

I would personally leave not to logic operators and leave selectors out of it you could instead use [== .foo? null]

expede commented 6 months ago

leave selectors out of it

Which I guess is an advantage of missing keys matching null, but I can't distinguish between "the field was null" and "the key was missing", right?

Gozala commented 6 months ago

I had a typo I meant [== .foo? null], which if I understand ? correctly will turn missing key into null.

Gozala commented 6 months ago

Which I guess is an advantage of missing keys matching null, but I can't distinguish between "the field was null" and "the key was missing", right?

I don't like that jq returns null when you don't have a corresponding element, I would personally prefer error there. That way adding ? in the end gives you a way to say no error just give me null. Which also aligns with JS optional chaining syntax.

expede commented 6 months ago

I had a typo I meant [== .foo? null], which if I understand ? correctly will turn missing key into null.

Yes that's the behaviour in jq 👍 That works for my use case above :)

I don't like that jq returns null when you don't have a corresponding element, I would personally prefer error there. That way adding ? in the end gives you a way to say no error just give me null. Which also aligns with JS optional chaining syntax.

Yeah that makes a lot of sense. I do think that sticking fairly closely to jq is still a good idea, and having a CLI tool to test things against has been helpful even in this thread. IMO we can diverge from their behaviour where there's a good reason. Let's maybe draft a bullet point list on how we'd like to diverge from them?

Gozala commented 6 months ago

Turns out jq utilized []? in a good way

➜ echo '[[1], 2]' | jq '.[].[]'
1
jq: error (at <stdin>:1): Cannot iterate over number (2)

➜ echo '[[1], 2]' | jq '.[].[]?'                                  1
smoyer64 commented 4 weeks ago

The "Optional Array Key" row in the "Supported Forms" table above is missing a ? operator.

smoyer64 commented 4 weeks ago

The "Supported Forms" table used as testcases in go-ucanto: https://github.com/storacha-network/go-ucanto/pull/15

fabiobozzo commented 1 week ago

Hi. I prepared a go-ucan PR with fixes to all broken "Pass" selector tests: https://github.com/ucan-wg/go-ucan/pull/18
The PR description includes a summary of my changes to the selector's logic, as well.
Feel free to check it out and add feedback.