ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Could IPLD Schemas include complex qualifiers and constraints, like regexps? #275

Open warpfork opened 4 years ago

warpfork commented 4 years ago

Could IPLD Schemas include complex qualifiers and constraints, like regexps?

Maybe!

(Meta: This may spawn an exploration report or other docs, but I'm starting with an issue for now, as it's a early thought.)

Previously, I've been opposed to this. A primary goal of IPLD Schemas is that they must be fast (and have predictable time costs) to compute whether or not the "match" some data. More powerful forms of pattern matching make this harder and harder, as does matching that gets more and more granular. It is for this reason that Schemas are built on a known set of basic structural patterns, chosen to be simple and predictable to match against, favoring patterns that can also be matched streamingly (e.g. without backtracking and the time/memory costs that non-streaming operation would imply), and does all matching using purely structural elements that are easy to examine in terms of IPLD Data Model Kinds alone; and in the rare case values are considered at all (such as keyed unions, or structure field names, etc) this operates by direct equality check (never any pattern matching inside the value data itself). Introducing complex qualifiers like regexps would seem to run against all of this: they inspect values deeply; they have more-complex-to-predict time costs; they generally make things much more complex; and so on.

But...

What if we introduced complex qualifiers (such as regexps for example) only as validators, and not as constraints used for schema matching?

The distinction may seem subtle, but the user story is this: you couldn't use regexps to describe protocol migration/evolution conditions (because libraries won't help you: any "TrySchemaStack(data, [schemaList]) (typedData | error)" helpers will error out and return completely on a validator fail, rather than proceed to probe for matching on additional schemas)... But, you could freely use regexps to describe and document rules about the data that are applied when the schema does match.

This could give us more power to fulfill our goals of providing a consistent and language-agnostic place to author data structure design documents, while not compromising our goals regarding protocol evolution and the critical role of flunk-fast unification to enable that.

Additionally, this delineation keeps good incrementality for schema library and tooling authors. Because it's clear that the complex qualifiers like regexps aren't needed for critical core functionality like determining if a schema matches some data, it's then pretty easy to say that such complex qualifiers be implemented "later" or "not in the mvp/v1".

I've used regexps as the example through this discussion so far, but we could equally well be talking about integer range constraints. Both are things we'd occasionally like to have.

Notably not considering:

Caveats remain:

Questions remain:

This is an early thought, but I wanted to get it out there. The idea of creating separate phases for matching versus (additional) validating seems to open up a potential avenue for solutions I previously would've rejected, and that's probably worth some further thought.