noprompt / meander

Tools for transparent data transformation
MIT License
922 stars 55 forks source link

Can't find an elegant way to enrich a list with another by joining #137

Closed lucywang000 closed 4 years ago

lucywang000 commented 4 years ago

This is a simplified version of a real world problem I try to solve. Say I have a list of persons and a list of bonuses:

(def data
  {:people [{:name :john
             :age  10}
            {:name :jen
             :age  11}
            {:name :jack
             :age  12}]
   :bonus  [{:name   :john
             :amount 100}
            {:name   :jack
             :amount 200}]})

And I want to join them so I can enrich each person record with this person's bonus (if any), i.e., this is what I want to get:

;; wanted
{:people [{:name :john
           :bonus-amount 100
           :age 10}
          {:name :jen
           :age  11}
          {:name :jack
           :bonus-amount 200
           :age 12}]
 :bonus  [{:name   :john
           :amount 100}
          {:name   :jack
           :amount 200}]}

A simple search works, but it yields a flattened list:

(me/search data
  {:people (me/scan {:name ?name :as ?person})
   :bonus (me/scan {:name ?name :amount ?amount})}

  {:people (assoc ?person :bonus-amount ?amount)})
;; => ({:people {:name :john, :age 10, :bonus-amount 100}}
;;     {:people {:name :jack, :age 12, :bonus-amount 200}})

So I have to add an extra step to get what I want:

(-> (me/search data
      {:people (me/scan {:name ?name :as ?person})
       :bonus (me/scan {:name ?name :amount ?amount})}

      {:people (assoc ?person :bonus-amount ?amount)})

    (me/match 
      ({:people !people} ...)

      {:people !people
       :bonus (:bonus data)}))
;; bingo!
;; => {:people
;;     [{:name :john, :age 10, :bonus-amount 100}
;;      {:name :jack, :age 12, :bonus-amount 200}],
;;     :bonus [{:name :john, :amount 100} {:name :jack, :amount 200}]}

I tried to do this in a single step with memory variables, but the result is wrong though the shape looks good.

(me/rewrite data
  {:people [{:name !name :as !person} ...]
   :bonus [{:name !name :amount !amount} ...]}

  {:people [{:bonus-amount !amount & !person} ...]})

;; Wrong! "jen" should not have a bonus
;; => {:people
;;     [{:name :john, :age 10, :bonus-amount 100}
;;      {:name :jen, :age 11, :bonus-amount 200}]}

This is because memory variables only collects and can't express the constraint that the two occurrence of !name must match each other.

Am I missing something that's available in meander, or that it's unavoidable to use two steps for this type of enrichment?

lucywang000 commented 4 years ago

Okay, find an easy solution:

(assoc data :people
       (me/search data
         {:people (me/scan {:name ?name :as ?person})
          :bonus (me/scan {:name ?name :amount ?amount})}

         (assoc ?person :bonus-amount ?amount)))

Closing this.

lucywang000 commented 4 years ago

Still a problem: this is a full join, but here I need a left join - otherwise the record for "jen" is lost

(assoc data :people
       (me/search data
         {:people (me/scan {:name ?name :as ?person})
          :bonus (me/scan {:name ?name :amount ?amount})}

         (assoc ?person :bonus-amount ?amount)))
;; the record for "jen" is lost in the :people list!
;; => {:people
;;     ({:name :john, :age 10, :bonus-amount 100}
;;      {:name :jack, :age 12, :bonus-amount 200}),
;;     :bonus [{:name :john, :amount 100} {:name :jack, :amount 200}]}

Looks like I have to collect this person->bonus map first (which is really handy with meander) and then do some post-processing:

(let [person->bonus (into
                      {}
                      (me/rewrites data
                        {:people (me/scan {:name ?name :as ?person})
                         :bonus (me/scan {:name ?name :amount ?amount})}

                        [?name ?amount]))]
  (update data :people (fn [people]
                         (map (fn [{:keys [name] :as person}]
                                (assoc person :bonus-amount (person->bonus (:name person))))
                              people))))
;;; bingo!
;; => {:people
;;     ({:name :john, :age 10, :bonus-amount 100}
;;      {:name :jen, :age 11, :bonus-amount nil}
;;      {:name :jack, :age 12, :bonus-amount 200}),
;;     :bonus [{:name :john, :amount 100} {:name :jack, :amount 200}]}

Or a slightly better version using specter:

(ns meander-demo
  (:require [meander.epsilon :as me]
            [com.rpl.specter :as sp]))

(let [person->bonus (into
                     {}
                     (me/rewrites data
                       {:people (me/scan {:name ?name :as ?person})
                        :bonus  (me/scan {:name ?name :amount ?amount})}

                       [?name ?amount]))]
  (sp/transform [:people sp/ALL]
                (fn [{:keys [name] :as person}]
                  (assoc person :bonus-amount (person->bonus name)))
                data))

Does meander supports this type of left join directly?

jimmyhmiller commented 4 years ago

So I've thought of a few different ways to approach this. First we can simulate the join you are looking for using or.

(assoc data :people
       (m/rewrites data
         (m/or
          {:people (m/scan {:name ?name :as ?person})
           :bonus (m/scan {:name ?name :amount ?amount})}

          (m/let [?amount nil]
            {:people (m/scan {:name ?name :as ?person})
             :bonus (m/not (m/scan {:name ?name }))}))

         {:amount ?amount & ?person}))

Basically, we handle the case where there is a join match and handle one where this is not, defaulting the amount to nil when we didn't find one.

We could also transform the :bonus into a map first and look things up.

(m/rewrite data 

  {:people [{:name !name :as !person} ...]
   :bonus-map (m/some ?bonus-map)
   :bonus ?bonus}

  {:people [{:amount (m/app get ?bonus-map !name) & !person} ...]
   :bonus ?bonus}

  {:people ?people
   :bonus (m/and ?bonus
                 [{:name !name :amount !amount} ...])}

  (m/cata {:people ?people
           :bonus-map (m/map-of !name !amount)
           :bonus ?bonus}))

Or you could have some index-by function and apply that first to the data.

(defn index-by
  "Like group by but assumes attr picks out a unique identity."
  [attr coll]
  (into {} (map (juxt attr identity) coll)))

(m/rewrite (assoc data :bonus-index (index-by :name (:bonus data))) 
 {:people [{:name !name :as !person} ...]
  :bonus ?bonus
  :bonus-index ?bonus-index}

  {:person [{:amount (m/app get-in ?bonus-index [!name :amount]) & !person} ...]
   :bonus ?bonus})

And finally, we could use the index-by and do a join on it.

(assoc data :people
       (m/rewrites (update data :bonus #(index-by :name %)) 
         {:people (m/scan {:name ?name :as ?person}) 
          :bonus (m/or {?name {:amount ?amount}}
                       (m/let [?amount nil]
                         (m/not {?name _}) ))}

         {:amount ?amount & ?person}))

I am probably missing some other combination off these techniques. But hopefully that helps :)

lucywang000 commented 4 years ago

Thanks @jimmyhmiller for the exhaustive answers!

I like a slightly modified version of using m/or withm/let.

(assoc data :people
       (me/rewrites data
         {:people (me/scan {:name ?name :as ?person})
          :bonus (me/or
                   (me/scan {:name ?name :amount ?amount})
                   (me/let [?amount nil]
                     (me/not (me/scan {:name ?name}))))}

         {:amount ?amount & ?person}))

Because it doesn't need to write the almost-identical LHS/RHS twice like your example. Also we don't need to construct an intermediate map like other versions without m/or.

Going one step further, this could be simplified with a custom maybe-scan syntax:

(me/defsyntax maybe-scan [required optional]
  (if (me/match-syntax? &env)
    `(me/or
       (me/scan ~(merge required optional))
       (me/let ~(-> optional
                    vals
                    (interleave (repeat nil))
                    vec)
         (me/not (me/scan ~required))))
    &form))

Then making use of this syntax:


(assoc data :people
       (vec
        (me/rewrites data
          {:people (me/scan {:name ?name :as ?person})
           :bonus (maybe-scan {:name ?name} {:amount ?amount})}

          {:amount ?amount & ?person})))
;; bingo!
;; => {:people
;;     [{:name :john, :age 10, :amount 100}
;;      {:name :jen, :age 11, :amount nil}
;;      {:name :jack, :age 12, :amount 200}],
;;     :bonus [{:name :john, :amount 100} {:name :jack, :amount 200}]}(me/rewrites data