Make Ruminate Functional

ninjudd / jiraph

A graph database with pluggable backends, written in Clojure.

Eclipse Public License 1.0

137 stars 15 forks source link

Make Ruminate Functional #16

Open ninjudd opened 12 years ago

ninjudd commented 12 years ago

As @amalloy and I were talking today, we realized that an alternative to implementing two-phase commits in flatland/retro#2 would be to make ruminate (#14) functional. So instead of the ruminant-function being a function with side-effects that updates the output layers, it will return an IOValue with actions to perform on all the output layers. This can then be composed with the action map for the source layer and returned.

As part of this change, we also discussed the possibility of adding a jiraph.graph/Layer protocol for all of the functional write methods. This would give us a more functional place to wrap layers, instead of wrapping the mutable jiraph.layer/Layer. This wouldn't make any of the existing wrappers more difficult, and it would make functional ruminate way easier. As a consequence of this change, the bulk of the functionality in jiraph.graph would move from the mutable functions to the functional functions.

amalloy commented 12 years ago

I think the only thing jiraph.graph/Layer needs to have is update-in-node, right? Since dissoc and assoc and all them go through update-in-node! at the moment. And it doesn't have to return a function the way layer/update-fn does: since it's not supposed to do anything mutably, it can just return an IOValue immediately.

ninjudd commented 12 years ago

I think we need wrappers for all the readers too.

amalloy commented 12 years ago

I think actually we don't need a new protocol at all, we just need to make jiraph.layer/Layer functional. Of course the layers need some way to mutably do changes, since the datastore is mutable, but those don't have to be exposed in any kind of protocol. For example, it could look something like this:

(defn- do-assoc! [layer id value]
  (...mutable stuff..))

(defrecord MasaiLayer [...]
  Layer
  (assoc-node [this id value]
    (retro/with-actions {:old (get-node this "id" nil) :new value}
      {this [#(do-assoc! % id value)]})))

Then the client only ever uses this immutable assoc-node function, and applies the changes with retro/unsafe-txn if they want it done immediately, or retro/txn if they're building up a bunch of stuff to do. I think this all works out at a technical level, but I see a few downsides:

Knowledge of IOValues is scattered across the code, including in new layer implementations, rather than living only in retro and jiraph.graph
It's awkward to do anything mutably, because you must know about transactions to get anything done at all; you can't just (do-this! layer-1 (do-that! layer-2 ...)).
I'm not yet sure how to migrate the extensive logic in jiraph.graph/update-in-node! to this new model

amalloy commented 12 years ago

Further example: masai-sorted/specialized-writer[adjoin] could look like:

(defmethod specialized-writer adjoin [layer layout keyseq _]
  (when (every? (fn [[path format]] ;; TODO support any reduce-fn
                  (= adjoin (:reduce-fn format)))
                layout)
    (let [db (:db layer)
          [id & keys] keyseq
          writer (partial db/append! db)]
      (fn [arg]
        (retro/with-actions {:old nil, :new arg}
          {this [#(write-paths! % writer layout id
                                (assoc-in {} keyseq arg)
                                false :codec)]})))))

amalloy commented 12 years ago

Regarding the above concerns:

Is not a huge deal and we're just going to accept the slight complication
This will be fine if we add wrappers such as assoc-node! in jiraph.graph that implicitly do retro/unsafe-txn around the returned IOValue
I think it is fairly mechanical to translate layers, as in the above example.

We may want to add a MutableLayer protocol for doing stuff the "old" way, and a wrapping deftype that "ports" this to the new functional protocol; jiraph.graph will only know about the functional protocol.

amalloy commented 12 years ago

We had some more discussions about how this will work. We no longer need update-in-node (or update-fn) to return an old/new/keyseq "burp". We get rid of update-fn, assoc-node, and dissoc-node at the layer level, and replace them all with a single update-in-node function, which returns an IOValue. We don't need the extra layer of indirection provided by update-fn, because IOValues are already functions.

Now, a ruminating layer intercepts update-in calls directly, and computes "transformed" update-in calls to pass to each of its output layers (in rare cases it may even transform the call to its input layer). It then composes all of those IOValues and returns that as its own result from update-in.

Removing assoc and dissoc means that more logic needs to go into the actions of our IOValues, instead of in jiraph.graph/update-in-node: the latter can simply delegate directly to jiraph.layer/update-in-node, which returns an IOValue. It's not yet clear whether we will:

Move the "fallback" logic from jiraph.graph/update-in-node into some helper that layers can use
Eliminate that fallback logic entirely, because none of the layers actually make much use of it, and it wouldn't be that much trouble to just always use their update-fn code.

ninjudd commented 12 years ago

One other tradeoff we're making by taking this approach is that we can no longer pass the old value to the output layers. However, this was really just an optimization anyway, and we can always just fetch the old value from the input layer in the transform if needed. We can use caching to optimize repeated reads of the same value if necessary.

amalloy commented 12 years ago

More thoughts from today's discussions!

We needed some way to be able to read "mid-transaction" values, even if the transaction has already been applied. E.g., layer A has been committed, but layer B, which ruminates on A, has not. To re-compute B's actions, we need to know what A looked like in the middle of the transaction. So, we need IOValue objects that are more transparent (or, to look at it another way, more heavy-weight) than retro's "seq of functions per layer". Specifically, we decided to build a new jiraph io-value type, which carries forward a function read, which provides an in-transaction "snapshot" view of any part of the graph. This type will be returned by update-in-node, and we will create a function for converting these into a real retro IOValue. The read function will be passed to all of the write actions, so that if they depend on current node values they can have it.

We tried two approaches to this: Justin's (provided first below) is easier to understand and takes less code; mine (provided second below) is more complicated for no particular reason. I am including it only for posterity, in case it turns out there is an unanticipated problem with Justin's approach.

Both approaches assume the existence of a function path-parts, which takes two paths and returns some information about how they overlap, if they do. An implementation of this function is at https://gist.github.com/bd78c0d7500e8cf7ceef. And also, in both proposed implementations of update-in-node, the function update-in-node! is assumed to be some layer-specific way of performing an update. It is not referring to the jiraph.graph function of the same name.

With that preamble aside, here is Justin's implementation of compose, update-in-node, and ->retro-ioval:


(def jiraph-compose concat)

(defn update-in-node [layer keyseq f args]
  [{:write (fn [layer read]
             (apply update-in-node! layer read keyseq f args))
    :layer layer :keyseq keyseq :f f :args args}])

(defn actualize-ioval [ioval]
  (reduce (fn [{:keys [actions read]} {:keys [write layer keyseq f args]}]
            {:actions (update-in actions [layer] (fnil conj [])
                                #(write % read))
             :read (fn [layer' read-keyseq]
                     (if-let [[read-path update-path get-path]
                              (and (same? layer layer')
                                   (path-parts read-keyseq keyseq))]
                       (-> (read layer' read-path)
                           (apply update-in* update-path f args)
                           (get-in get-path))
                       (read layer' read-keyseq)))})
          {:actions {} :read jiraph.graph/get-in-node}
          ioval))

(defn ->retro-ioval [ioval]
  (:actions (actualize-ioval ioval)))

And my own, inferior versions:


(defn combine-actions [action1 action2 read-wrapper]
  (fn [layer read]
    (when action1
      (action1 layer read))
    (action2 layer (wrap-read1 read))))

(defn jiraph-compose [{writes1 :writes wrap-read1 :wrap-read}
                      {writes2 :writes wrap-read2 :wrap-read}]
  {:writes (reduce (fn [writes [layer action]]
                     (update-in writes [layer] combine-actions action wrap-read1))
                   writes1, writes2)
   :wrap-read (fn [read]
                (wrap-read2 (wrap-read1 read)))})

(defn ->retro-ioval [ioval]
  (into {}
        (for [[layer action] (:writes ioval)]
          [layer [#(action % jiraph.graph/get-in-node)]])))

(defn update-in-node [layer write-keyseq f args]
  {:writes {layer (fn [layer' read]
                    (apply update-in-node! layer' read write-keyseq f args))}
   :wrap-read (fn [read]
                (fn [layer' read-keyseq]
                  (if-let [[read-path update-path get-path]
                           (and (same? layer layer')
                                (path-parts read-keyseq write-keyseq))]
                    (-> (read layer' read-path)
                        (apply update-in* update-path f args)
                        (get-in get-path))
                    (read layer' read-keyseq))))})

ninjudd commented 12 years ago

Note that in my implementation, an io-value is a seq of tuples containing a function to mutably do the writing along with the arguments passed to update-in-node.

In Alan's implementation, an io-value is a single function to read any value on any layer along with a map from layer to a function that mutably writes the changes for that layer.

ninjudd commented 12 years ago

@amalloy, how are we going to get the read function into transform?

amalloy commented 12 years ago

I don't think we need to. Transform (on a ruminate layer, I assume you mean) just returns a new jiraph-iovalue, which will be composed in with all the others according to the wrapping rules.

On 09/06/2012 07:32 PM, Justin Balthrop wrote:

@amalloy, how are we going to get the read function into transform?

On Sep 6, 2012, at 6:52 PM, Alan Malloy notifications@github.com wrote:

More thoughts from today's discussions!

We needed some way to be able to read "mid-transaction" values, even if the transaction has already been applied. E.g., layer A has been committed, but layer B, which ruminates on A, has not. To re-compute B's actions, we need to know what A looked like in the middle of the transaction. So, we need IOValue objects that are more transparent (or, to look at it another way, more heavy-weight) than retro's "seq of functions per layer". Specifically, we decided to build a new jiraph io-value type, which carries forward a function read, which provides an in-transaction "snapshot" view of any part of the graph. This type will be returned by update-in-node, and we will create a function for converting these into a real retro IOValue. The read function will be passed to all of the write actions, so that if they depend on current node values they can have it.

We tried two approaches to this: Justin's (provided first below) is easier to understand and takes less code; mine (provided second below) is more complicated for no particular reason. I am including it only for posterity, in case it turns out there is an unanticipated problem with Justin's approach.

Both approaches assume the existence of a function path-parts, which takes two paths and returns some information about how they overlap, if they do. An implementation of this function is at https://gist.github.com/bd78c0d7500e8cf7ceef. And also, in both proposed implementations of update-in-node, the function update-in-node! is assumed to be some layer-specific way of performing an update. It is not referring to the jiraph.graph function of the same name.

With that preamble aside, here is Justin's implementation of compose, update-in-node, and ->retro-ioval:

(def jiraph-compose concat)

(defn update-in-node [layer keyseq f args] [{:write (fn [layer read](apply update-in-node! layer read keyseq f args)) :layer layer :keyseq keyseq :f f :args args}])

(defn actualize-ioval [ioval](reduce %28fn [{:keys [actions read]} {:keys [write layer keyseq f args]}] {:actions %28update-in actions [layer] %28fnil conj []%29

%28write % read%29%29

:read %28fn [layer' read-keyseq] %28if-let [[read-path update-path get-path] %28and %28same? layer layer'%29 %28path-parts read-keyseq keyseq%29%29] %28-> %28read layer' read-path%29 %28apply update-in* update-path f args%29 %28get-in get-path%29%29 %28read layer' read-keyseq%29%29%29}%29 {:actions {} :read jiraph.graph/get-in-node} ioval))

(defn ->retro-ioval [ioval](:actions %28actualize-ioval ioval%29)) And my own, inferior versions:

(defn combine-actions [action1 action2 read-wrapper](fn [layer read] %28when action1 %28action1 layer read%29%29 %28action2 layer %28wrap-read1 read%29%29))

(defn jiraph-compose [{writes1 :writes wrap-read1 :wrap-read} {writes2 :writes wrap-read2 :wrap-read}] {:writes (reduce (fn [writes [layer action]](update-in writes [layer] combine-actions action wrap-read1)) writes1, writes2) :wrap-read (fn [read](wrap-read2 %28wrap-read1 read%29))})

(defn ->retro-ioval [ioval](into {} %28for [[layer action] %28:writes ioval%29] [layer [#%28action % jiraph.graph/get-in-node%29]]%29))

(defn update-in-node [layer write-keyseq f args] {:writes {layer (fn [layer' read](apply update-in-node! layer' read write-keyseq f args))} :wrap-read (fn [read](fn [layer' read-keyseq] %28if-let [[read-path update-path get-path] %28and %28same? layer layer'%29 %28path-parts read-keyseq write-keyseq%29%29] %28-> %28read layer' read-path%29 %28apply update-in* update-path f args%29 %28get-in get-path%29%29 %28read layer' read-keyseq%29%29))}) — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/flatland/jiraph/issues/16#issuecomment-8354240.

ninjudd commented 12 years ago

Oh. I thought transform was going to return new parameters that could be passed to update-in-node like this: (map #(apply update-in-node layer %) (transform [keyseq f args])).

But even so, how do you vary the io-value returned based on calling read?

amalloy commented 12 years ago

Right, I realized on the way home that it's crazy, we do need to make read available to the transform one way or another. Not sure how yet.

On 09/06/2012 08:54 PM, Justin Balthrop wrote:

Oh. I thought transform was going to return new parameters that could be passed to |update-in-node| like this: |(map #(apply update-in-node layer %) (transform [keyseq f args]))|.

But even so, how do you vary the io-value returned based on calling |read|?

— Reply to this email directly or view it on GitHub https://github.com/flatland/jiraph/issues/16#issuecomment-8355364.

amalloy commented 12 years ago

I think the format the transform takes has to be a little more complicated, like: given a keyseq on the source layer, it returns something like:

A list of all the keyseqs on every layer that it needs the current values for
A function that takes the current values of all of those things and returns an iovalue (or a list of calls to update-in, maybe?)

I haven't thought it through, and I'm sure the details aren't quite right, but does the approach make sense?

ninjudd commented 12 years ago

I don't think that will be enough. What if the transform has to do a walk to calculate the updates? In that case, the list of keys to read depends on the values read at the previous walk step.

amalloy commented 12 years ago

We did a bunch of pair programming today designing stuff. Code attached for safekeeping; explanation to follow in the next day or so.


(defn advance-reader [read actions]
  (reduce (fn [read wrapper]
            (wrapper read))
          read, (map :wrap-read actions)))

(defn jiraph-compose [& fs]
  (fn [read]
    (first (reduce (fn [[actions read] f]
                     (let [more-actions (f read)]
                       [(into actions more-actions)
                        (advance-reader read more-actions)]))
                   [[] read]
                   fs))))

(defn read-wrapper [layer write-keyseq f args]
  (fn [read]
    (fn [layer' read-keyseq]
      (if-let [[read-path update-path get-path]
               (and (same? layer layer')
                    (path-parts read-keyseq write-keyseq))]
        (-> (read layer' read-path)
            (apply update-in* update-path f args)
            (get-in get-path))
        (read layer' read-keyseq)))))

(defn update-in-node [layer keyseq f args]
  (fn [read]
    [{:write (fn [layer]
               (apply update-in-node! layer read keyseq f args))
      :wrap-read (read-wrapper layer keyseq f args)
      :layer layer :keyseq keyseq :f f :args args}]))

(defn ->retro-ioval [ioval]
  (let [actions (ioval jiraph.graph/get-in-node)]
    (with-actions nil
      (reduce (fn [retro-ioval {:keys [layer write]}]
                (update-in retro-ioval [layer] (fnil conj []) write))
              {}, actions))))

(fn [read]
  (let [people-update ((update-in-node people ["alan" :chair] inc) read)
        [old-idx new-idx] ((juxt read (advance-reader read people-update))
                           people ["alan" :chair])]
    (reduce into people-update
            (when-not (= old-idx new-idx)
              [((update-in-node layer [old-idx] disj "alan") read)
               ((update-in-node layer [new-idx] conj "alan") read)]))))

(defn top-level-indexer [field]
  (fn [[source index] keyseq f & args]
    (fn [read]
      (let [source-update ((apply update-in-node source keyseq f args) read)
            read' (advance-reader read source-update)]
        (reduce into source-update
                (when-let [id (first (if (seq keyseq)
                                       (when (or (not (next keyseq))
                                                 (= field (second keyseq)))
                                         keyseq)
                                       args))]
                  (let [[old-idx new-idx] ((juxt read read') source [id field])]
                    (when (not= old-idx new-idx)
                      [((update-in-node index [old-idx] disj id) read)
                       ((update-in-node index [new-idx] conj id) read)]))))))))j

amalloy commented 12 years ago

Motivation

So, to explain the code above. The problem we discovered, with both of the jiraph-iovalue structures @ninjudd and I discussed last week, was that there is no good way to provide the read function (for depending on in-transaction values) to iovalues in, eg, ruminate-update-in, because those iovalues must be computed without access to a previous iovalue.

The solution we decided on is to make jiraph's IOValue be a function taking as input a read function, and returning an "actualized IOValue", which we define as a vector (not seq) of maps defining actions (structure of this map discussed later). These actualized IOValues cannot be composed or otherwise fiddled with: the composition operations must happen while they are in the "latent" stage, as a function of read.

The items in this vector of maps may not be executed exactly in sequence (after conversion to Retro IOValues, they will be "batched", one transaction per layer), but the read function will return a view of the graph as if they were executed in sequence.

Actualized IOValue

As mentioned previously, we introduced a new "type", a jiraph-specific analogue to the Retro IOValue. It is more transparent, so that more useful operations can be performed on it before actually committing changes to the database mutably. Specifically, each entry in the vector is a map with the following keys:

:write: A function that receives a Layer object (set at the correct revision for writing), and should write changes to the backing store.
:wrap-read: A function accepting a read function and returning a new read function that incorporates the changes written by this action.
:layer: The layer on which this action will cause changes.
(optional) :keyseq, :f, :args: The action that this update is intended to perform, as arguments to update-in. These keys are not used, but may be useful for debugging or inspection.

Implementation notes

advance-reader takes a read function and an actualized iovalue (recall, a vector of maps), and returns a new read function with all of the actions applied. It is primarily used in the implementation of compose, but is also useful for constructing more complicated actions, such as the top-level-indexer example, which uses advance-reader to read both before and after the "primary" update and see if a change is needed in the index layer. This is an improvement over simply composing three separate actions and composing them normally, because we can pretend they are being done in a different order than they actually are, and we can also avoid having to create a Retro IOValue that will eventually become a no-op.

read-wrapper depends on the same? function to determine whether the layer being read is the one that has been written to. It will need to be a new protocol function, probably in Layer. We can't simply compare layers by equality, because they'll be at different revisions, and may have other changes as well.

Caution must be used when creating the :wrap-read key to not use the read function they already have in scope, instead returning a functon that accepts a new, possibly-unrelated read function. This should be easy enough, since read-wrapper handles all of the difficult work, but someone implementing :wrap-read by hand might accidentally use the lexically-available read function.