metosin / malli

High-performance data-driven data specification library for Clojure/Script.
Eclipse Public License 2.0
1.43k stars 204 forks source link

Polymorphic schemas #1053

Open frenchy64 opened 2 months ago

frenchy64 commented 2 months ago

I went through several iterations on the internal representation. It felt correct once the serialized form of a polymorphic schema was human readable and ergonomic (doesn't require quoting to evaluate correctly, so no symbols allowed). The tradeoff is that we need to implement capture-avoiding substitution, and that's very error-prone. I tested a few cases.

The identity schema's form is [:all [:X] [:=> [:cat :X] :X]]. It's introduced with the m/all macro by (m/all [X] [:=> [:cat X] X]).

To instantiate it, you perform a walk/postwalk-replace with {:X this-to-instantiate}.

(m/inst (m/all [X] [:=> [:cat X] X])
        [:int])
;=> [:=> [:cat [:schema :int]] [:schema :int]]

The extra :schema's make sure regex's don't splice. By default, variables are single schemas.

This representation is delicate, but works and is serializable if a few rules are followed:

It's a leaf schema, and it's essential that its body is not manipulated by users before instantiating it.

For example, variables are renamed if they clash with any keywords in the body:

(m/all [cat] [:=> [:cat cat] cat])
;=> [:all [:cat0] [:=> [:cat :cat0] :cat0]]

m/inst also does renaming to avoid capture. A variable can be "captured" by any schema, not just another variable, since variables are keywords and so are schemas.

The following renames to :y0 because if it was :y, then instantiation would destroy the body via (postwalk-replace {:y ...}).

(m/inst (m/all [x] (m/all [y] x))
        [(m/all [y] y)])
;=> [:all [:y0] [:schema [:all [:y] :y]]]

I don't know how to generate polymorphic functions, but there's some ideas in my deterministic function PR https://github.com/metosin/malli/pull/1042

The mg/check implementation is pretty simple and can be improved. It just instantiates variables to very small schemas of a single value.