hoplon / javelin

Spreadsheet-like dataflow programming in ClojureScript.
801 stars 44 forks source link

Clojure support? #25

Open bsima opened 8 years ago

bsima commented 8 years ago

Are there any plans or interest in using Javelin from Clojure? I was thinking earlier today that it would be nice to have cells on the backend too.

Yeah, core.async works for most of the same problems as Javelin, but the cell abstraction is often easier and simpler to use than channels.

alandipert commented 8 years ago

There is a Clojure port kicking around: https://github.com/hoplon/javelin/blob/master/src/javelin/core_clj.clj

It's prototype-quality and we've never used it for anything, but if you have an idea about using it and want to contribute, we're happy to accept PRs.

bsima commented 8 years ago

I saw that and had a feeling that's what that was, but wasn't 100% sure. I'll take a look at it

onetom commented 8 years ago

When this question comes up, @micha and @alandipert usually conclude the discussion by saying they haven't seen a really compelling case yet for using cells on the backend.

I would be very interested to see a really characteristic use case, so @bsima if you have one, please share!

bsima commented 8 years ago

@onetom, well I'm currently working on a streaming data analysis service, and I thought having the reactive cell graph would be useful, especially since I'll be working with a lot of graph data structures. A few discussions on SO about Haskell lenses seem to corroborate my assumption (1, 2). I have yet to get to the implementation of the service, still reading and designing the system, so we'll see where my research leads me.

jconti commented 8 years ago

Seems like anyone doing pub-sub to long-polling streams would like this kind of thing on the server. Basically anyone implementing event-push to web socket or long-polling web clients needs to distribute events and record who has received what. All that state-full book-keeping is a bother, and easy to write bugs into. Mix in an async implementation (which is the only solution to having any kind of scale with long-polling) and the bugs tend to multiply.

bsima commented 8 years ago

just dumping some notes here for later...

It looks like this kind of cell abstraction, in some form or another, was used primarily for UI programming. E.g. Garnet, which predated even Tilton's Cells. However, the case of using cells on the backend is getting stronger the more I read.

bsima commented 8 years ago

More notes...

A big problem that Rich and Stuart bring up is transactional order. From the first link, a message from Rich:

if agent A notifies B and C, and B notifies C, then a change to A will send actions to B and C, and B will subsequently send to C, but A's message to C and B's message to C can arrive in any order. Once you are ok with that, you can do multithreaded reactive programming.

I guess core.async was created to get around this problem. I don't know if Javelin has this problem or not, but if it does, a cell abstraction built on top of core.async would be cool.

Alternatively, there is the FrTime language, which is mentioned in the second email thread.

  1. Embedding Dynamic Dataflow in a Call-by-Value Language
  2. FrTime: Functional Reactive Programming in PLT Scheme
    • ftp://ftp.cs.brown.edu/pub/techreports/03/cs03-20.pdf

A first scan of the papers seems to indicate that FrTime influenced Javelin, since paper 1 talks about a "lift" function and here we see a lift implemented in Javelin. I'll read FrTime more closely this weekend.

There is also an old dataflow api and source, and an SO thread.

Interesting "cells.clj" gist by Rich from 6 years ago.

alandipert commented 8 years ago

@bsima Excellent stuff! I'll throw this in:

This is the paper and FRP library that most directly influenced Javelin: http://cs.brown.edu/~sk/Publications/Papers/Published/mgbcgbk-flapjax/paper.pdf lift comes from Haskell and is the operation in several FRP frameworks that converts "plain" functions to functions over Behaviors/Signals. Behaviors and Signals in these frameworks correspond directly to cells. From the FRP perspective, Javelin is an FRP library with only Behaviors/Signals, and no event streams.

bsima commented 8 years ago

Yeah, the FlapJax paper cites the FrTime papers, and the FrTime author Gregory Cooper mentions it on his website and is an author on the FlapJax paper. So that's the basic historic path from Conal Elliot's invention circa 1997, all the way up to the current Javelin implementation.

But there are really two (or three?) historical lines; one for FRP (and/or "dataflow" which is a rather ambiguous term), and one for cells. They aren't really compatible, because cells update synchronously, whereas "dataflow" is typically concerned with managing concurrency, and FRP is concerned with managing asynchronous dataflow.

So we have 3 concepts that are being tangled together and confused. This really helps me understand why the old dataflow API was dropped, and the cell discussions killed, they all mention the problems of concurrency/asynchronicity without unpacking (decomplecting) the problem. Looking at these implementations from 6 years ago, they're rather naive, basically just simplified ports of Tilton's cells. Trying to build cells on top of agents is interesting as it attempts to make synchronous cells work with concurrent processing, but ultimately fails because there is no time constraint, so values could arrive at cells in any order -- cells are meant for sychronicity after all. So, the two relevant solutions to this concurrency problem are:

  1. CSP/core.async, time constraints emerge out of the (rather imperative) "channel" abstraction.
  2. FrTime & FlapJax impose a time constraint in the evaluation strategy of "signals"

Of course CSP isn't actually reactive, it's just passing messages. If we want reactivity then we need FrTime/FlapJax. (Or it might be possible to implement FrTime-like semanatics using CSP, but that's an experiment for another time.)

The brilliance of Javelin is that it takes the FrTime semantics and wraps it in the cell abstraction, which is like painting a simple interface over the academic lingo of FRP. I think micha mentioned in his 20-minute talk a few weeks ago that Javelin cells are kinda alternative to core.async, but that's not strictly true. Maybe it is in ClojureScript where you don't have true concurrency, but really cells and CSP are complementary and solving two slightly different problems.

Okay, after all this long-winded thinking-out-loud, I'm concluding that Javelin would absolutely be useful in Clojure. I'm gonna finish reading the FrTime papers closely and also read the Javelin code again more closely, and then I'll work on experimenting with Javelin on the JVM next week or so.

bsima commented 8 years ago

I've got an in-progress branch here: bsima/javelin/jvm

I haven't worked much with custom Clojure data structures so this took quite a bit of research, but so far it's going well. I can construct a cell, but set-formula! doesn't work and I'm not sure why yet. All I get is the extremely unhelpful java.lang.ClassCastException: clojure.lang.PersistentArrayMap cannot be cast to clojure.lang.IRef when calling add-watch on a Cell object.

The CLJS code is really dense and undocumented so it takes me a while to figure out exactly what it's doing. But a lot of the code is shared between CLJ and CLJS, so once I get the CLJ working we could consolidate using .cljc

My goal is to get this working by the end of the month. Thinking toward the future: I like the idea of specifying protocols for the cells to implement, because (as I said in the above analyses) the cell abstraction is useful outside of FRP, so perhaps an ICell (or similar) protocol could be used to implement cells with different backends - cells with core.async, or threaded cells, for example.

sjmackenzie commented 8 years ago

Really looking forward to this!

alandipert commented 8 years ago

@bsima @sjmackenzie would you be interested in a code walkthrough via Hangout or similar? Javelin/cells come from a handful of ideas that code mashes together and obfuscates. I'll have some time this weekend, can probably enlist @micha also.

cddr commented 8 years ago

Ah the cells manifesto. That was my introduction to reactive programming. BTW @bsima the name is Tilton not Triton.

Also, I'd be interested in this hangout too. If you decide to go ahead with it, will you put a link in here?

alandipert commented 8 years ago

@cddr will do, and will mention anyone else who expressed interest.

sjmackenzie commented 8 years ago

Yes indeed it would be quite cool to listen in to that talk.

bsima commented 8 years ago

@alandipert yes please! The Thanksgiving holiday and general work distracted me from working on this, but a code walkthrough would be super helpful and just generally interesting.

Saturday morning or early afternoon would be perfect, I'm on East Coast time.

@cddr oops! Not sure how I made that typo.. thanks :)

xificurC commented 8 years ago

I'd love to join as well if you choose a good time. Edit: Oh it's weekend you're thinking of, that won't work for me. Have fun!

jconti commented 8 years ago

I'll likely be on a plane, but would love to review a recording of the walk through if possible.

Sent from my iPhone

On Dec 1, 2015, at 12:43 PM, Peter Nagy notifications@github.com wrote:

I'd love to join as well if you choose a good time.

— Reply to this email directly or view it on GitHub.

alandipert commented 8 years ago

@xificurC @sjmackenzie @cddr @bsima I made a doodle, please select the time that works best for you: http://doodle.com/poll/z55yyt2cx6s5nuia

I'll try to set up a Hangout on Air and drop the link here when it's ready so we can save the video.

sjmackenzie commented 8 years ago

I have no idea what timezone the doodle times are using. Once decided please put the time + tz here and I'll convert.

alandipert commented 8 years ago

@sjmackenzie timezone is EDT

alandipert commented 8 years ago

For those joining us this weekend for the code walkthrough, http://cs.brown.edu/~sk/Publications/Papers/Published/mgbcgbk-flapjax/paper.pdf Section 3.4 is a concise description of what we'll be looking at. The rest of the paper also has a lot of useful context. Worth a read, and don't worry, you won't be tested :-)

onetom commented 8 years ago

I remember it was also mentioned that cells and data-flow in general is the most useful when it backs UI. Well, here is an example when it powers a JavaFX game UI which is written in Clojure: Game Development Development - Michael Nygard & Ragnar Svensson

alandipert commented 8 years ago

OK the time is set, Sat 12/5/15 3:00 PM EDT. Here is the Hangout on Air link: https://plus.google.com/events/ccnkeqnkgfhv8933s3ff1l2obd0

cddr commented 8 years ago

Hey I'm here. Watching. But there's nothing I can type in the google video

:-)

alandipert commented 8 years ago

@cddr change of plans, we're here now: https://unhangout.media.mit.edu/h/TurboComputingFortress

bsima commented 8 years ago

Just a heads up, I'm still working on this. Today reading about Elm's concurrent FRP ideas for possible use in training a neural network.

As for the Javelin code, I think it would be prudent to start with a few minor refactorings:

I could do all these refactorings over the next few weeks, and it would help me understand the codebase besides. Let me know what you think

actsasgeek commented 8 years ago

I'm very heartened to see this discussion. I've been wondering if something like this might not be very useful for data analysis. When doing ETL, it's a bit of a bother in the REPL to go back and fix mistakes or assumptions and then re-execute everything (and error prone at that). I actually though of this the other day and couldn't remember if Javelin was ClojureScript only...and found this.

cddr commented 8 years ago

Hopefully you'll forgive the tangential nature of this comment but one thing I found really useful about Common Lisp when I was doing lots of ETL stuff was the debugger. It is beyond anything in any other language I've used. You could be near the end of an hour long ETL job, and some exception is thrown due to unexpected data. The debugger would allow you to navigate the stack trace to find the offending data, redefine the function that processes it to include a fix for the bug, then re-commence the job where it left off rather than having to start again.

bsima commented 8 years ago

Wow, this was a riveting read. Basically explains how to construct a "functional database" out of the same types of cells that Javelin uses

https://en.wikipedia.org/wiki/Functional_Database_Model

hierophantos commented 8 years ago

Has anyone here looked at potentially using propagators as proposed in the very-riveting talk by Gerry Sussman (of Scheme and SICP fame) titled "We Really Don't Know How to Compute!".

There's also a Clojure library implementation called Propaganda.

bsima commented 7 years ago

Looks like this aspect of the pulsar library is basically Javelin http://docs.paralleluniverse.co/pulsar/#dataflow-reactive-programming

metasoarous commented 7 years ago

Has there been any more progress on this?

bsima commented 7 years ago

I haven't been working on it unfortunately :(

spoerri commented 7 years ago

Here's a good academic paper to hang our hat on: https://arxiv.org/abs/1406.2063v1

stathissideris commented 7 years ago

I'd like to use javelin for a JavaFX project, could anyone point me to the work in progress so that I can study it and possibly pick it up from where it was left? @bsima ?

bsima commented 7 years ago

@stathissideris I haven't been working on it, I'm using Kafka on the backend for dataflow-like programming. You might also want to look at pulsar as it has support for dataflow programming http://docs.paralleluniverse.co/pulsar/#dataflow-reactive-programming

metasoarous commented 7 years ago

@stathissideris You also might want to look at Onyx; You can use it for dataflow style programming, it has a very data-driven API, and you can execute your workflows in the browser, on the backend, or even distributed over a zookeeper cluster. The fact that it's built primarily for the latter most of these cases means that the setup is a little more involved than simple cell computing usage in the ballpark of javelin, but I'd bet one could wrap the functionality up in some macros that make it more buttery for this purpose.

stathissideris commented 7 years ago

@bsima @metasoarous Thanks for the suggestions! I like javelin's small footprint, so I think as a first step I'll do a spike to see how easy it would be to port properly to clojure. If I fail, I will investigate pulsar and onyx (which I thought couldn't run without zookeeper!)

stathissideris commented 7 years ago

Wow, hitting a bit of a wall because it of Cljs/Clj differences when it comes to deftype. It seems that Javelin relies on the fact that you can set! fields of existing instances of a deftype, but in Clojure all the fields are immutable by default (there is a a way to override this):

(deftype foobar [field])
(def foo (foobar. 5))
(set! (.-field foo) 100)

Will throw:

IllegalAccessException Can not set final java.lang.Object field datacore.cells.foobar.field to java.lang.Long sun.reflect.UnsafeFieldAccessorImpl.throwFinalFieldIllegalAccessException (UnsafeFieldAccessorImpl.java:76)

This is a way to override this, but it's starting to show that I may need to take this to a slightly different direction for Clojure.

bsima commented 7 years ago

@stathissideris When I was working on it, I had to set ^volatileMutable (or something) for each field. More problems came up when I was trying to get the tests to pass, some weird errors I could never figure out. My work if you're interested: https://github.com/bsima/javelin/tree/jvm

stathissideris commented 7 years ago

@bsima thanks for the link, I'll have a look. volatileMutable makes the fields of the type mutable but also removes any thread safety that you get with Clojure's "regular" data structures. It's very likely that the weird errors you were getting were due to race conditions or other similar problems.

alandipert commented 7 years ago

6 months ago I made a lot of progress on the cljc branch.

I can't remember exactly how far I got, but I think... kinda far? I used refs for the mutable fields.

I ended up pretty disgusted by the way it came out. I think it's much better to have separate Clojure and ClojureScript namespaces, or perhaps even different Clojars artifacts.

So, the cljc work wouldn't be helpful as-is, but if you can figure out a way to spit out .clj from the .cljc, you'd have a decent starting point that's more up-to-date than the existing javelin_clj.clj.

stathissideris commented 7 years ago

@alandipert thanks, I had a look at that branch and started uncommenting some of the tests to see how far you got (most of them did pass!). I have a question about dosync*: it relies on binding which modifies the bindings per-thread. This is not relevant to cljs (because it has one thread), but could it prove problematic for the JVM?

burn2delete commented 7 years ago

Self hosted javelin would be really awesome and also affects the cljc thing

alandipert commented 7 years ago

@stathissideris oh yeah: that's one of the things I handwaved.

Reflecting on it now, I'm not sure Javelin needs to have its own dosync in Clojure if refs are used.

Reasoning chain:

On ClojureScript, binding is the way to establish a global dynamic scope. In Clojure the binding is scoped to the thread, as you point out.

ClojureScript doesn't have ref or dosync. There, binding is thin veneer over setting a global variable and then setting it back, which is what we needed to make sure dosync nested appropriately.

But in Clojure, dosync is thread-global: even refs in different threads participate in the same transaction. I think it makes sense to apply to cells the same semantics that already apply to refs. At least, I can't think of a good reason for them to behave differently.

stathissideris commented 7 years ago

So I've been doing some reading and decided to have a go at implementing cells in Clojure from scratch. Here is my effort so far (be warned, this is work in progress!):

https://github.com/stathissideris/datacore/blob/518abe53729661e7c9d6a217fa1dd64b2ce88466/src/datacore/cells.clj

And here are some tests to show how it behaves:

https://github.com/stathissideris/datacore/blob/518abe53729661e7c9d6a217fa1dd64b2ce88466/test/datacore/cells_test.clj

This implementation is closer to how I think Kenny Tilton's rube works (although I've yet to read his actual code). Tilton is the original developer of cells in CL.

The main differences of my implementation to Javelin are:

  1. No static code analysis. Links between cells are established the first time a formula cell is calculated, and the library "becomes aware" that in that context other cells are deref-ed. This is similar to rube.
  2. There is a central registry of cells, their values, and dependencies between them. This could be used to visualize state and the links.
  3. No lenses.
  4. Formula cells need to refer to other cells using deref. In my opinion this makes the code look more like normal Clojure code (which it is!). See tests for examples.

Check the propagate function, it's very similar to Javelin's!

I'm planning to use this as part of my java-fx project and see if it's enough for my needs. For now, I'm glad that it's a simpler implementation than Javelin.

Any feedback welcome!

alandipert commented 7 years ago

@stathissideris cool!

Re: 1, when you say that Javelin does static code analysis, are you talking about the cell= macro? If so, it's worth noting that the cell= macro doesn't know anything about the relationships between cells. It just generates code that will build the cell graph at runtime. It does leverage a few heuristics to minimize the number of cells that are created when the code does run.

Basically, it transforms expressions like (cell= (+ a b)) into ((formula +) a b), where formula does all the work, like recognizing if a or b are cells, and linking them up if so.

If that differs from your perception about what cell= does it would be helpful for me to know, because you won't have been the first to be misled. It's on my short list to clean up the Javelin README and de-emphasize cell=. Actually, thank you in advance for any feedback you might have about the README. It's kind of... overparticular.

Re: 4, interesting choice! I can see the appeal of visually demarcating references to cells in formulas. Javelin gained a macro for doing something similar recently, formula-of

Anyway you're doing really cool work and I enjoy keeping up with it. Keep us posted :smile:

alandipert commented 7 years ago

@stathissideris forgot to mention, clojure has an IAtom interface now, which you could extend instead of having your own swap!

stathissideris commented 7 years ago

@alandipert

Maybe static analysis is too grandiose a term, I meant the fancy code walking necessary for the hoisting of the args for cell=. My implementation does not do that and as a consequence, the linking happens when a sink is first dereferenced, instead of when the cell is constructed (which is what happens in Javelin). I tried using my cells in practice and I was caught off guard a couple of times by this limitation, so it may be worth having some code walking after all.

About dosync: A Clojure version of Javelin could use Clojure's dosync but because transactions within dosync can be retried, that would potentially mean that cell= could no longer used reliably for side effects, despite the propagate function being smart enough to only visit each formula cell once, because if the transaction is tried more than once you'd get the same side effect more than once. I was thinking of a way around this, and I'm leaning towards having a 3rd type of cell (maybe called effect=) which would be run after all the formula cell changes have been propagated, outside a transaction, and would guarantee at most one execution per swap!.

I'm aware of IAtom but I'm not sure yet whether I want my cells to look like atoms completely - still deciding on the interface! I implemented IRef to get the convenience of typing @.

Did a bit more work today for cycle detection. :)