metosin / malli

High-performance data-driven data specification library for Clojure/Script.
Eclipse Public License 2.0
1.48k stars 209 forks source link

Add support for constraint schemas: This, that, or both keys in map #474

Open brettrowberry opened 3 years ago

brettrowberry commented 3 years ago

I'm just getting started with Malli, and I really like it!

There is something I can't figure out.

Consider this schema:

(def Schema
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z {:optional true} string?]])

(m/validate Schema {:x true}) => true

Now imagine I want to require :y, :z, or both such that:

(m/validate Schema {:x true})              => false
(m/validate Schema {:x true :y 1})         => true
(m/validate Schema {:x true :z "hi"})      => true
(m/validate Schema {:x true :y 1 :z "hi"}) => true

Attempt 1

(def Schema
  [:map
   [:x boolean?]
   [:or
    [:y {:optional true} int?]
    [:z {:optional true} string?]]])

(m/validate Schema {:x true})

; Execution error (ExceptionInfo) at malli.impl.util/-fail! (util.cljc:17).
; :malli.core/invalid-schema {:schema :z}

Attempt 2

(def Schema
  [:or
   [:map
    [:x boolean?
     :y int?]]
   [:map
    [:x boolean?
     :z string?]]
   [:map
    [:x boolean?
     :y int?
     :z string?]]])

(m/validate Schema {:x true}) => true

That's not good!

Attempt 3

(def Schema
  [:and
   [:map [:x boolean?]]
   [:or
    [:map [:y int?]]
    [:map [:z string?]]
    [:map [:y int? :z string?]]]])

(m/validate Schema {:x true})              => false
(m/validate Schema {:x true :y 1})         => true
(m/validate Schema {:x true :z "hi"})      => true
(m/validate Schema {:x true :y 1 :z "hi"}) => true

This seems to work, but there's a bit of repetition.

The errors are pretty good, but perhaps they could be better if I did this the right way:

(me/humanize (m/explain Schema {:x true}))

=>
{:y
 ["missing required key"
  "missing required key"],
 :z ["missing required key"]}

(me/humanize (m/explain Schema {:x true :y "hi"}))

=> 
{:y
 ["should be an int"
  "should be an int"]
 :z ["missing required key"]}

What's the best way to do this?

ikitommi commented 3 years ago

Current solution:

(def Schema
  [:and
   [:map
    [:x boolean?]
    [:y {:optional true} int?]
    [:z {:optional true} string?]]
   [:fn {:error/message "y & z are mutually exclusive"
         :error/path [:y]}
    (fn [{:keys [z y]}]
      (not (and z y)))]])

(-> Schema
    (m/explain {:x true :y 1 :z "hi"})
    (me/humanize))
; => {:y ["y & z are mutually exclusive"]}

Future: a declarative way to do this, something like https://github.com/bsless/malli-keys-relations

ikitommi commented 3 years ago

actually, this is duplicate of #438. As this has more info, will close the other one.

bsless commented 3 years ago

What do you think about a datalog-like syntax for describing the constraints? Something like a simplified :where clause It's serializable and simple to compile

ikitommi commented 3 years ago

I was thinking of using meander. I think both could be tested "on paper" for few sample cases to get more insight on how they would/not work. Code syntax samples into this thread?

bsless commented 3 years ago

Sure, I'll whip up a few examples

bsless commented 3 years ago

I considered two notations, one as property and one as a standalone schema:

[:map
 {:where
  '[[?e :x ?x]
    [?e :y ?y]
    [(< ?x ?y)]]}
 [:x int?]
 [:y int?]]

[:and
 [:map
  [:x int?]
  [:y int?]]
 [:where
  '[[?e :x ?x]
    [?e :y ?y]
    [(< ?x ?y)]]]]

I'll with the property notation for brevity. I'm also missing a syntax for binding the original value being queried, let's go with :in for now querying keys in a map:

[:map
 {:in '?e
  :where
  '[[?e :x ?x]
    [?e :y ?y]
    [(< ?x ?y)]]}
 [:x int?]
 [:y int?]]

Binding intermediate values:

[:map
 {:in '?e
  :where
  '[[?e :x ?x]
    [?e :y ?y]
    [(count ?x) ?n]
    [(< ?n ?y)]]}
 [:x [:vector int?]]
 [:y int?]]

Nested maps:

[:map
 {:in 'e?
  :where
  '[[?e :x ?x]
    [?x :z ?z]
    [?e :y ?y]
    [(== ?z ?y)]]}
 [:x [:map [:z int?]]]
 [:y int?]]

"every" syntax:

[:map
 {:in '?e
  :where
  '[[?e :x [?x :as ?xs]]
    [?x :z ?z]
    [?e :y ?y]
    [(count ?xs) ?n]
    [(< ?x ?n)]
    [(== ?z ?y)]]}
 [:x [:vector [:map [:z int?]]]]
 [:y int?]]

Implicit unification:

[:map
 {:in '?e
  :where
  '[[?e :x [?x :as ?xs]]
    [?x :z ?y]
    [?e :y ?y]
    [(count ?xs) ?n]
    [(< ?x ?n)]]}
 [:x [:vector [:map [:z int?]]]]
 [:y int?]]

Mutual exclusion (can't have a situation where both 2 and 3 succeed), this is also a "contains" syntax.

[:map
 {:in '?e
  :where
  '[[?e :x ?x] ;1
    (not
     [?e :y] ;2
     [?e :z])]} ;3
 [:x int?]
 [:y int?]
 [:z int?]]

I tried to introduce as few innovations as possible with this syntax. What do you think?

ikitommi commented 3 years ago

Thanks for your thoughts @bsless! I think the separate :where type looks better of the two. About using datalog, that would be powerful as the nested sample shows. As it's generic, it could be used for sequences too, a generic matching on the data.

But, not ready on committing to an official malli relations-mapping solution yet, so if you would like to implement that, it could start as an add-on lib. Will try to resolve the global registry so that it's easy to plug-in custom schema types from 3rd party libraries.

cheers.

bsless commented 3 years ago

Sounds good. One issue I'd appreciate directions on is how to compile it to be performant. A naive approach would be building up a map of all the bindings, but where's the fun in that? Can it be done with functions? Do I need some CPS magic?

ikitommi commented 3 years ago

not an datalog perf expert, there is #datalog chan in clojurians, maybe people there can share insights. I believe Asami has a fast imp (https://github.com/threatgrid/asami)

bsless commented 3 years ago

I figured I can use malli to parse the query :)

bsless commented 3 years ago

First not so successful attempt at translating the datalog spec directly to malli Not so successful because ::clause can't be built into a schema. Any tips @ikitommi ? https://gist.github.com/bsless/632b4040a2b2ad7469369f52cd610c06

bsless commented 3 years ago

Update: the parser works. Data patterns are a bit too general so I'm considering adding specialized EA and EAV patterns but they're slightly ugly. Regarding execution model: I'll start with some naive linear interpreter but eventually I'll want to add some sort of planner and compiler. Any suggestions regarding modeling are welcome

ebunders commented 8 months ago

I also ran into this problem, and we discussed it on slack.

I tried something like

[:union
   [:or
    [:map [:data map?]]
    [:map [:dependencies {:optional true} [:set [:ref "dependency"]]]]]
   [:map [:type {:optional true} string?]]]

but that didn't work out. I also tried a lot of stuff with merge but it didn't work as expected and I just didn't really understand the logic of something like this:

[:and
   [:map [:x boolean?]]
   [:or
    [:map [:y int?]]
    [:map [:z string?]]
    [:map [:y int? :z string?]]]]

This seems to say: a map with property :x, and a map with property :y, a map with property :z or a map with property :y and property :z. What I would naively expect is more like the following (here I create a naive address schema that can either have a zip code and house number or a city, street name and house number):

[:map
  [:house-number number?]
  [:or 
    [:zip-code string?]
    [:and
      [:city string?]
      [:street-name string?]]]]

The top level map suggests that everything here happens in the context of a single map. So:

This kind of syntax would honor the principle of least astonishment (dear to my heart) for me, but Tommi pointed out that te problem with this is that Malli can in this case not distinguish :or & :and from properties in the map. I guess a combination of :or and :map is used for that. But as again Tommi pointed out this could be fixed with name-spacing all Malli keywords. That seems like a good idea all round.

so, for completeness:

[::m/map
  [:house-number number?]
  [::m/or 
    [:zip-code string?]
    [::m/and
      [:city string?]
      [:street-name string?]]]]

To me this kind of syntax feels most familiar, it is what I would expect to have to write.

logseq-cldwalker commented 3 months ago

I also hope something like @ebunders suggested works one day as the syntax is more explicit and easier to maintain than the current :fn workaround