ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Add unit kind to schemas #292

Closed warpfork closed 2 years ago

warpfork commented 3 years ago

We need a "unit" kind in IPLD Schemas.

why (by theory): This falls fairly directly out of a cardinality-counting approach to design. We already have a kind for cardinality=2 types -- boolean -- and it would be very strange indeed not to have a cardinality=1 type. That cardinality=1 type is typically called "Unit" in other languages; we should use that name too. The Wikipedia page for this topic is also very good: https://en.wikipedia.org/wiki/Unit_type

why (by example): There are several example scenarios where unit types would've been valuable in specifying systems we've already written up using IPLD Schemas, and we've taken shortcuts and made odd hacks for the lack of them. For one example, the schema-schema currently contains at least one map which is functionally a set, and the value has no meaning; at present, these values are all "null" and the type in the schema claims to be "Null", but really, this should be called "Unit" (and the "Null" type currently alluded to is actually a type error! This needs fixing). In another example, in earlier drafts of Selectors, the Matcher type would've been a unit type, because some value was necessary in the Selector schema because a union was used in recursion, and a terminating type was necessary, but the terminator contained no details other than its type. (This later changed, and we added more details to the Matcher type in Selectors, so we would no longer refactor the Selector schema to use unit types -- but earlier drafts were affected by this, and for some time contained weird hacks due to the lack of unit types.) It's likely we'll continue finding more examples like this in the future until we add a way to properly express unit types in IPLD Schemas.

proposal and representations: Introducing a unit kind to schemas would allow us to write better schemas in the cases where cardinality=1 is the desired semantic... and it also means we get a way to specify various representations, which will increase our ability to describe existing protocols.

I can imagine several representations that might go with a unit type: null, emptymap, true, and false all seem like choices that might be useful because we might encounter them when describing existing protocols in the wild.

An example of a schema using this might be:

type Foobar union {
    | Foo "foo"
    | Bar "bar"
} representation keyed
type Foo unit representation null
type Bar [Int]

... which would match this document: {"foo": null} (this contains the Bar type), or match {"bar":[1,2]} (which contains the Foo type).

Or, for a slightly more provocative example, unit types would also compose fine with kinded unions in some interesting ways:

type Wilder union {
    | Foo null
    | Bar list
    | Baz map
} representation kinded
type Foo unit representation null
type Bar [Int]
type Baz unit representation emptymap

... which would match null (this is the Foo type!) or [1,2] (this is Bar, obviously) or {} (this is Baz -- also a unit type!).

As you can see from this second example, it's possible to have more than one distinct unit type appear in a schema, and indeed, even appear together as alternative possible inhabitants of a union.

Unit types could also appear anywhere else any other type can appear: as the type for fields in a struct (although this is... fairly useless, as far as I can imagine? might as well be consistent about it); as the value type in lists (also not very useful, but again, why not); and as the value type in maps (this, at least, is something that certainly shows up in protocols once and a while). I've only focused on the examples using unions here because those are probably the most interesting ones (and the examples I've encountered and wanted a unit type in the most often, in personal anecdote).

a Unit type: The unit kind should probably be accompanied a "Unit" type in the prelude -- type Unit unit -- just like type String string is implicitly in the prelude of all schemas, making "String" directly available as a named type. This might not be strictly necessary -- I'd also be happy having a unit kind, and leaving it up to users to create a named one when they need it -- but including this in a standard prelude also seems fairly reasonable, I think.

workarounds: It is currently possible to work around the lack of a unit kind by declaring a type of type Unit struct{}. This fits the semantic definition of "unit" because only one value matches this type (an empty map: {}). However, this is less pleasing than having a unit kind, because it's less clear and less self-documenting. It's also less powerful than having a unit kind because this hack only works with the empty map representation, whereas we can support several common representation strategies by making "unit" into a kind that's known to IPLD Schemas.

what needs to change: The schema-schema needs to be updated to include "unit". The schema DSL parsers will need to be updated to also know about the "unit" keyword. The schema kinds document should be extended (and it should also probably retract "null" as a schema kind, come to think of it). Implementations need to add "unit" their enum for schema-level kinds. Implementations will need to support unit types and their representations. Implementations of kinded unions will need extending to handle unit types and their various representations correctly.

rvagg commented 3 years ago

discussed this a bit in today's meeting, if anyone else wants to participate in this it might be helpful https://www.youtube.com/watch?v=54qvDubWpac

no resolution, a bit of scepticism from me about whether we couldn't just make Null a concrete thing -- I'm still unclear why type Null null wouldn't be do-able and why we need an additional thing to cover that basic case. I understand the utility in having a generic thing we could allow various representations to present as - type Foo unit representation true (?) - where users need the flexibility to define a thing that they use as a basic terminal that's not what we officially bless (currently implicitly blessing null).

warpfork commented 2 years ago

(This isn't really the place to track this anymore, but for dense cross-linking: this eventually ended up handled in https://github.com/ipld/ipld/pull/136 .)