metasoarous / oz

Data visualizations in Clojure and ClojureScript using Vega and Vega-lite
Eclipse Public License 1.0
831 stars 75 forks source link
clojure clojurescript dataviz vega vega-lite

oz

Great and powerful scientific documents & data visualizations

Clojars Project cljdoc badge


Please use 1.6.0-alpha36 for the most recent stable version of Oz.

For the latest notebook and async data & document processing capabilities, please try out the 2.0.0-alpha5, but note that it may have some bugs still.

Overview

Oz is a data visualization and scientific document processing library for Clojure built around Vega-Lite & Vega.

Vega-Lite & Vega are declarative grammars for describing interactive data visualizations. Of note, they are based on the Grammar of Graphics, which served as the guiding light for the popular R ggplot2 viz library. With Vega & Vega-Lite, we define visualizations by declaratively specifying how attributes of our data map to aesthetic properties of a visualization. Vega-Lite in particular focuses on maximal productivity and leverage for day to day usage (and is the place to start), while Vega (to which Vega-Lite compiles) is ideal for more nuanced control.

About oz specifically...

Oz itself provides:

Learning Vega, Vega-Lite & Oz

To take full advantage of the data visualization capabilities of Oz, it pays to understanding the core Vega & Vega-Lite. If you're new to the scene, it's worth taking a few minutes to orient yourself with this mindblowing talk/demo from the creators at the Interactive Data Lab (IDL) at University of Washington.

Vega & Vega-Lite talk from IDL

Watched the IDL talk and hungry for more content? Here's another which focuses on the philosophical ideas behind Vega & Vega-Lite, how they relate to Clojure, and how you can use the tools from Clojure using Oz.

Seajure Clojure + Vega/Vega-Lite talk

This Readme is the canonical entry point for learning about Oz. You may also want to check out the cljdoc page (if you're not there already) for API & other docs, and look at the examples directory of this project (references occassionally below).

Ecosystem

Some other things in the Vega/Vega-Lite ecosystem you may want to look at for getting started or learning more

REPL Usage

If you clone this repository and open up the dev/user.clj file, you can follow along by executing the commented out code block at the end of the file.

Assuming you're starting from scratch, first add oz to your leiningen project dependencies

Clojars Project

Next, require oz and start the plot server as follows:

(require '[oz.core :as oz])

(oz/start-server!)

This will fire up a browser window with a websocket connection for funneling view data back and forth. If you forget to call this function, it will be called for you when you create your first plot, but be aware that it will delay the first display, and it's possible you'll have to resend the plot on a slower computer.

Next we'll define a function for generating some dummy data

(defn play-data [& names]
  (for [n names
        i (range 20)]
    {:time i :item n :quantity (+ (Math/pow (* i (count n)) 0.8) (rand-int (count n)))}))

oz/view!

The main function for displaying vega or vega-lite is oz/view!.

For example, a simple line plot:

(def line-plot
  {:data {:values (play-data "monkey" "slipper" "broom")}
   :encoding {:x {:field "time" :type "quantitative"}
              :y {:field "quantity" :type "quantitative"}
              :color {:field "item" :type "nominal"}}
   :mark "line"})

;; Render the plot
(oz/view! line-plot)

Should render something like:

lines plot

Another example:

(def stacked-bar
  {:data {:values (play-data "munchkin" "witch" "dog" "lion" "tiger" "bear")}
   :mark "bar"
   :encoding {:x {:field "time"
                  :type "ordinal"}
              :y {:aggregate "sum"
                  :field "quantity"
                  :type "quantitative"}
              :color {:field "item"
                      :type "nominal"}}})

(oz/view! stacked-bar)

This should render something like:

bars plot

vega support

For vega instead of vega-lite, you can also specify :mode :vega to oz/view!:

;; load some example vega (this may only work from within a checkout of oz; haven't checked)

(require '[cheshire.core :as json])

(def contour-plot (oz/load "examples/contour-lines.vega.json"))
(oz/view! contour-plot :mode :vega)

This should render like:

contours plot

Hiccup

We can also embed Vega-Lite & Vega visualizations within hiccup documents:

(def viz
  [:div
    [:h1 "Look ye and behold"]
    [:p "A couple of small charts"]
    [:div {:style {:display "flex" :flex-direction "row"}}
      [:vega-lite line-plot]
      [:vega-lite stacked-bar]]
    [:p "A wider, more expansive chart"]
    [:vega contour-plot]
    [:h2 "If ever, oh ever a viz there was, the vizard of oz is one because, because, because..."]
    [:p "Because of the wonderful things it does"]])

(oz/view! viz)

Note that the Vega-Lite & Vega specs are described in the output vega as using the :vega and :vega-lite keys.

You should now see something like this:

composite view

Note that vega/vega-lite already have very powerful and impressive plot concatenation features which allow for coupling of interactivity between plots in a viz. However, combing things through hiccup like this is nice for expedience, gives one the ability to combine such visualizations in the context of HTML documents.

Also note that while not illustrated above, you can specify multiple maps in these vectors, and they will be merged into one. So for example, you can do [:vega-lite stacked-bar {:width 100}] to override the width.

As client side reagent components

If you like, you may also use the Reagent components found at oz.core to render vega and/or vega-lite you construct client side.

[:div
 [oz.core/vega { ... }]
 [oz.core/vega-lite { ... }]]

At present, these components do not take a second argument. The merging of spec maps described above applies prior to application of this reagent component.

Eventually we'll be adding options for hooking into the signal dataflow graphs within these visualizations so that interactions in a Vega/Vega-Lite visualization can be used to inform other Reagent components in your app.

Please note that when using oz.core client side, the :data entry in your vega spec map should not be nil (for example you're loading data into a reagent atom which has not been populated yet). Instead prefer an empty sequence () to avoid hard to diagnose errors in the browser.

Loading specs

Oz now features a load function which accepts the following formats:

As example of the markdown syntax:

# An example markdown file

```edn vega-lite
{:data {:url "data/cars.json"}
 :mark "point"
 :encoding {
   :x {:field "Horsepower", :type "quantitative"}
   :y {:field "Miles_per_Gallon", :type "quantitative"}
   :color {:field "Origin", :type "nominal"}}}
```

The real magic here is in the code class specification edn vega-lite. It's possible to replace edn with json or yaml, and vega with vega-lite as appropriate. Additionally, these classes can be hyphenated for compatibility with editors/parsers that have problems with multiple class specifications (e.g. edn-vega-lite)

Note that embedding all of your data into a vega/vega-lite spec directly as :values may be untenable for larger data sets. In these cases, the recommended solution is to post your data to a GitHub gist, or elsewhere online where you can refer to it using the :url syntax (e.g. {:data {:url "https://your.data.url/path"} ...}).

One final note: in lieue of vega or vega-lite you can specify hiccup in order to embed oz-style hiccup forms which may or may not contain [:vega ...] or [:vega-lite ...] blocks. This allows you to embed nontrivial html in your markdown files as hiccup, when basic markdown just doesn't cut it, without having to resort to manually writing html.

Export

We can also export static HTML files which use Vega-Embed to render interactive Vega/Vega-Lite visualizations using the oz/export! function.

(oz/export! spec "test.html")

Notebook support

Oz now also features Jupyter support for both the Clojupyter and IClojure kernels. See the view! method in the namespaces oz.notebook.clojupyter and oz.notebook.iclojure for usage.

example notebook

Requiring in Clojupyter

Take a look at the example clojupyter notebook.

If you have docker installed you can run the following to build and run a jupyter container with clojupyter installed.

docker run --rm -p 8888:8888 kxxoling/jupyter-clojure-docker

Note that if you get a permission related error, you may need to run this command like sudo docker run ....

Once you have a notebook up and running you can either import the example clojupyter notebook or manually add something like:

(require '[clojupyter.misc.helper :as helper])
(helper/add-dependencies '[metasoarous/oz "x.x.x"])
(require '[oz.notebook.clojupyter :as oz])

;; Create spec

;; then...
(oz/view! spec)

Based on my own tinkering and the reports of other users, the functionality of this integration is somewhat sensitive to version/environment details, so running from the docker image is the recommended way of getting things running for the moment.

Requiring in IClojure

If you have docker installed you can get an IClojure environment up and running using:

docker run -p 8888:8888 cgrand/iclojure

As with Clojupyter, note that if you get a permission related error, you may need to run this command like sudo docker run ....

Once you have that running, you can:

/cp {:deps {metasoarous/oz {:mvn/version "x.x.x"}}}
(require '[oz.notebook.iclojure :as oz])

;; Create spec

;; then...
(oz/view! spec)

Live code reloading

Oz now features Figwheel-like hot code reloading for Clojure-based data science workflows. To start this functionality, you specify from the REPL a file you would like to watch for changes, like so:

(oz/live-reload! "live-reload-test.clj")

As soon as you run this, the code in the file will be executed in its entirety. Thereafter, if you save changes to the file, all forms starting from the first form with material changes will be re-evaluated. Additionally, whitespace changes are ignored, and namespace changes only trigger a recompile if there were other code changes in flight, or if there was an error during the last execution. We also try to do a good job of logging notifications as things are running so that you know what is running and how long things are taking for to execute long-running forms.

Collectively all of these features give you the same magic of Figwheel's hot-code reloading experience, but geared towards the specific demands of a data scientist, or really anyone who needs to quickly hack together potentially long running jobs.

Here's a quick video of this in action: https://www.youtube.com/watch?v=yUTxm29fjT4

Of import: Because the code evaluated with live-reload! is evaluated in a separate thread, you can't include any code which might try to set root bindings of a dynamic var. Fortunately, setting root var bindings isn't something I've ever needed to do in my data science workflow (nor should you), but of course, it's possible there are libraries out there that do this. Just be aware that it might come up. This seems to be a pretty fundamental Clojure limitation, but I'd be interested to hear from the oracles whether there's any chance of this being supported in a future version of Clojure.

There's also a related function, oz/live-view! which will similarly watch a file for changes, oz/load! it, then oz/view! it.

Sharing features

Looking to share your cool plots or hiccup documents with someone? We've got you covered via the publish! utility function.

This will post the plot content to a GitHub Gist, and use the gist uuid to create a vega-editor link which prints to the screen. When you visit the vega-editor link, it will load the gist in question and place the content in the editor. It renders the plot, and updates in real time as you tinker with the code, making it a wonderful yet simple tool for sharing and prototyping.

user=> (oz/publish! stacked-bar)
Gist url: https://gist.github.com/87a5621b0dbec648b2b54f68b3354c3a
Raw gist url: https://api.github.com/gists/87a5621b0dbec648b2b54f68b3354c3a
Vega editor url: https://vega.github.io/editor/#/gist/vega-lite/metasoarous/87a5621b0dbec648b2b54f68b3354c3a/e1d471b5a5619a1f6f94e38b2673feff15056146/vega-viz.json

Following the Vega editor url with take you here (click on image to follow):

vega-editor

As mentioned above, we can also share our hiccup documents/dashboards. Since Vega Editor knows nothing about hiccup, we've created ozviz.io as a tool for loading these documents.

user=> (oz/publish! viz)
Gist url: https://gist.github.com/305fb42fa03e3be2a2c78597b240d30e
Raw gist url: https://api.github.com/gists/305fb42fa03e3be2a2c78597b240d30e
Ozviz url: http://ozviz.io/#/gist/305fb42fa03e3be2a2c78597b240d30e

Try it out: http://ozviz.io/#/gist/305fb42fa03e3be2a2c78597b240d30e

Authentication

In order to use the oz/publish! function, you must provide authentication.

The easiest way is to pass :auth "username:password" to the oz/publish! function. However, this can be problematic in that you don't want these credentials accidentally strewn throughout your code or ./.lein-repl-history.

To address this issue, oz/publish! will by default try to read authorization parameters from a file at ~/.oz/github-creds.edn. The contents should be a map of authorization arguments, as passed to the tentacles api. While you can use {:auth "username:password"} in this file, as above, it's far better from a security standpoint to use OAuth tokens.

When you're finished, it's a good idea to run chmod 600 ~/.oz/github-creds.edn so that only your user can read the credential file.

And that's it! Your calls to (oz/publish! spec) should now be authenticated.

Sadly, GitHub used to allow the posting of anonymous gists, without the requirement of authentication, which saved us from all this hassle. However, they've since deprecated this. If you like, you can submit a comment asking that GitHub consider enabling auto-expiring anonymous gists, which would avoid this setup.

Static site generation

If you've ever thought "man, I wish there was a static site generation toolkit which had live code reloading of whatever page you're currently editing, and it would be great if it was in Clojure and let me embed data visualizations and math formulas via LaTeX in Markdown & Hiccup documents", boy, are you in for a treat!

Oz now features exectly such features in the form of the oz/build!. A very simple site might be generated with:

(build!
  [{:from "examples/static-site/src/"
    :to "examples/static-site/build/"}])

The input formats currently supported by oz/build! are

Oz should handle image and css files it comes across by simply copying them over. However, if you have any json or edn assets (datasets perhaps) which need to pass through unchanged, you can separate these into their own build specification, like so:

(defn site-template
  [spec]
  [:div {:style {:max-width 900 :margin-left "auto" :margin-right "auto"}}
   spec])

(build!
  [{:from "examples/static-site/src/site/"
    :to "examples/static-site/build/"
    :template-fn site-template}
   ;; If you have static assets, like datasets or imagines which need to be simply copied over
   {:from "examples/static-site/src/assets/"
    :to "examples/static-site/build/"
    :as-assets? true}])

This can be a good way to separate document code from other static assets.

Specifying multiple builds like this can be used to do other things as well. For example, if you wanted to render a particular set of pages using a different template function (for example, so that your blog posts style differently than the main pages), you can do that easily

(defn blog-template
  [spec]
  (site-template
    (let [{:as spec-meta :keys [title published-at tags]} (meta spec)]
      [:div
       [:h1 {:style {:line-height 1.35}} title]
       [:p "Published on: " published-at]
       [:p "Tags: " (string/join ", " tags)]
       spec])))

(build!
  [{:from "examples/static-site/src/site/"
    :to "examples/static-site/build/"
    :template-fn site-template}
   {:from "examples/static-site/src/blog/"
    :to "examples/static-site/build/blog/"
    :template-fn blog-template}
   ;; If you have static assets, like datasets or imagines which need to be simply copied over
   {:from "examples/static-site/src/assets/"
    :to "examples/static-site/build/"
    :as-assets? true}])

Note that the blog-template above is using metadata about the spec to inform how it renders. This metadata can be written into Markdown files using a yaml markdown metadata header (see /examples/static-site/src/)

---
title: Oz static websites rock
tags: oz, dataviz
---

# Oz static websites!

Some markdown content...

The title in particular here will wind it's way into the Title metadata tag of your output HTML document, and thus will be visible at the top of your browser window when you view the file. This is a pattern that Jekyll and some other blogging engines use, and markdown-clj now supports extracting this data.

Again, as you edit and save these files, the outputs just automatically update for you, both as compiled HTML files, and in the live-view window which lets you see your changes as you make em. If you need to change a template, or some other detail of the specs, you can simply rerun build! with the modified arguments, and the most recently edited page will updated before your eyes. This provides for a lovely live-view editing experience from the comfort of your favorite editor.

When you're done, one of the easiest ways to deploy is with the excellent surge.sh toolkit, which makes static site deployment a breeze. You can also use GitHub Pages or S3 or really whatever if you prefer. The great thing about static sites is that they are easy and cheap to deploy and scale, so you have plenty of options at your disposal.

EDN translation caveats in expression strings

In general, it's pretty easy to translate specs between EDN (Clojure data) and JSON. However, there is one place where you can get a little tripped up if you don't know what to do, and that's in expressions (as used in calculate and filter transforms).

The expression you see in the Vega docs typically look like {"calculate": "datum.attr * 2", "as": "attr2"} (as JSON). However, in Clojure, we often use kebab cased keywords for data map keys (e.g. :cool-attr). For these attributes, you obviously can't use datum.cool-attr, since this will be interpretted as data.cool - attr, and either error out or not produce the desired result. Instead you'll need to use datum['cool-attr'] in your expressions when your keys are kebab cased.

This may be easy to miss, since most of the docs assume that you're working with camel or snake cased keys. It is mentioned somewhere in there if you look, but tends to bite us Clojurists more frequently than practitioners of other languages, and so isn't particularly front and center. Once you know the trick though, you should be on your way.

Local CLJS development

Oz is now compiled (on the cljs side) with Shadow-CLJS, together with the Clojure CLI tooling. A typical workflow involves running clj -M:shadow-cljs watch devcards app (note, older versions of clj use -A instead of -M; consider updating). This will watch your cljs files for changes, and immediately compile both the app.js and devcards.js targets (to resources/oz/public/js/).

In general, the best way to develop is to visit http://localhost:7125/devcards.html, which will pull up a live view of a set of example Reagent components defined at src/cljs/oz/core_devcards.cljs. This is the easiest way to tweak functionality and test new features, as editing src/cljs/oz/core.cljs will trigger updates to the devcards views.

If it's necessary or desirable to test the app (live-view, etc) functionality "in-situ", you can also use the normal Clj REPL utilities to feed plots to the app.js target using oz/view!, etc. Note that if you do this, you will need to use whatever port is passed to oz/view! (by default, 10666) and not the one printed out when you start clj -M:shadow-cljs.

See documentation for your specific editing environment if you'd like your editor to be able to connect to the Shadow-CLJS repl. For vim-fireplace, the initial Clj connection should establish itself automatically when you attempt to evaluate your first form. From there simply execute the vim command :CljEval (shadow/repl :app), and you should be able to evaluate code in the *.cljs files from vim. Code in *.clj files should also continue to evaluate as before as well.

IMPORTANT NOTE: If you end up deploying a version of Oz to Clojars or elsewhere, make sure you stop your clj -M:shadow-cljs watch process before running make release. If you don't, shadow will continue watching files and rebuild js compilation targets with dev time configuration (shadow, less minification, etc), that shouldn't be in the final release build. If however you are simply making changes and pushing up for me to release, please just leave any compiled changes to the js targets out of your commits.

Debugging & updating Vega/Vega-Lite versions

I'm frequently shocked (pleasantly) at how if I find I'm unable to do something in Vega or Vega-Lite that I think I should, updating the Vega or Vega-Lite version fixes the problem. As a side note, I think this speaks volumes of the stellar job (pun intended) the IDL has been doing of developing these tools. More to the point though, if you find yourself unable to do something you expect to be able to do, it's not a bad idea to try

  1. Make sure your Oz version is up to date, in case there's a more recent Vega/Vega-Lite versions required there fix the problem.
  2. Check npm to see if there's a more recent version of the Vega/Vega-Lite (or Vega-Embed or Vega-Tooltip, as appropriate).
  3. Clone Oz, update the package.json file, and attempt to rebuild the Oz as described above.
  4. If this still doesn't solve your problem, file an issue on the appropriate Vega GitHub project. I've found the developers super responsive to issues.

License

Copyright © 2018 Christopher Small

Forked from Vizard (with thanks) - Copyright © 2017 Yieldbot, Inc.

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.