scicloj / metamorph

Context pipelines
A Clojure library designed to providing pipelining operations.

It allows to express any data transformation and machine learning pipeline as a simple sequence of pure functions:

(def pipe
   (select-columns [:Text :Score])
   (count-vectorize :Text :bow nlp/default-text->bow {})
   (bow->sparse-array :bow :bow-sparse #(nlp/->vocabulary-top-n % 1000))
   (set-inference-target :Score)
   (ds/select-columns [:bow-sparse :Score])
   (model {:p 1000
           :model-type :maxent-multinomial
           :sparse-column :bow-sparse})))

Several code examples for metamorph are available in this repo metamorph-examples

Pipeline operation

Pipeline operation is a function which accepts context as a map and returns possibly modified context map.


Context is just a map where pipeline information is stored. There are three reserved keys which are supposed to help organize a pipeline:

Functions which only manipulate the data, should simply behave the same in any :mode, so ignoring :metamorph/mode

Compliant operations

All the steps of a metamorph pipeline are functions which need to follow the following conventions, in order to work well together:

A typical skeleton of a compliant function looks like this:

(defn my-data-transform-function [any number of options]
  (fn [{:metamorph/keys [id data mode] :as ctx}]
    ;; do something with data and eventual with id and mode
    ;; and write it back somewhere in the ctx often to key `:metamorph/data`, but could be any key
    ;; the assoc makes as well sure, that other data in ctx is left unchanged
    (assoc ctx :metamorph/data ......)

Metamorph compliant libraries

The following libraries provied metamorph compliant functions in a recent version:

library purpose link
tablecloth dataset manipulation dataset manipulation machine learning
sklearn-clj sklearn estimators as metamorph functions

Other libraries which do "data transformations" can decide to make their functions metamorph compliant. This does not require any dependency on metamorhp, just the usage of the standard keys.

Functions can easely be lifted to become metamorph compliant. For this we have the function `metamorph/lift"

A sister project allows to evaluate machine learning pipelines based on metamorph.

A machine learining solution based on metamorph pipelines including various classification and regression models.

Similar concept in sklearn pipelines

The metamorph concept is similar to the pipeline concept of sklearn, which allows as well to run a give pipeline in fit and transform. But metamorph allows to combine models with arbitrary transform functions, which don't need to be models.

Two types of functions in pipeline

We foresee that mainly 2 types of functions get added to a pipeline.

  1. Mode independend functions: They only manipulate the main data object, and will ignore all other information in contexts. Neither will they use :metamorph/mode nor the :metamorph/id in the context map.
  2. Mode dependend functions: These functions will behave different depending on the :mode and will likely store data in the context map, which can be used by the same function in an other mode or by other functions later in the pipeline.

Pipelines can be constructed from functions or as pure data

Metamorph pipelines can be either constructed from a sequence of function calls via th function metmorhp.core/pipeline or declarative as a sequence of maps.

Both rely on the same functions.

See here for examples:

This should allow advanced use cases, like the generation of pipelines, which gives large flexibility for hyper parameter tuning in machine learning.

Advantages of the metamorph concept

Creating a pipeline

To create a pipeline function you can use two functions:


Compliant pipeline operations can either be created by "lifting" functions which work on the data object itself, or by using them from compliant libraries.

Most functions in tablecloth take a dataset as input in first position, and return a dataset. This means they can be used with the function "metamorhp.core/lift" to be converted (lifted) into a metamorph compliant function. (Tabecloth has lifted versions of its functions in namespace tablecloth.pipeline)

In this short example, the main data object in the context is a simple string.

(require '[scicloj.metamorph.core :as morph])

;; a regular function which takes and returns a main object 
(defn regular-function-to-be-lifted
  [main-object par1 par2]
  (str "Hey, " (clojure.string/upper-case main-object) " , I'm regular function! (pars: " par1 ", " par2 ")"))

;; we make a pipeline-fn using `lift` and the regular function

(def lifted-pipeline
   (morph/lift regular-function-to-be-lifted 1 2)))

;; lifted-pipeline is a regular Clojure function, taking the context in first place
(lifted-pipeline {:metamorph/data "main data project"} ) 
:metamorph{:data "Hey, MAIN DATA PROJECT , I'm regular function! (pars: 1, 2)"}


