Open bsima opened 8 years ago
There is a Clojure port kicking around: https://github.com/hoplon/javelin/blob/master/src/javelin/core_clj.clj
It's prototype-quality and we've never used it for anything, but if you have an idea about using it and want to contribute, we're happy to accept PRs.
I saw that and had a feeling that's what that was, but wasn't 100% sure. I'll take a look at it
When this question comes up, @micha and @alandipert usually conclude the discussion by saying they haven't seen a really compelling case yet for using cells on the backend.
I would be very interested to see a really characteristic use case, so @bsima if you have one, please share!
@onetom, well I'm currently working on a streaming data analysis service, and I thought having the reactive cell graph would be useful, especially since I'll be working with a lot of graph data structures. A few discussions on SO about Haskell lenses seem to corroborate my assumption (1, 2). I have yet to get to the implementation of the service, still reading and designing the system, so we'll see where my research leads me.
Seems like anyone doing pub-sub to long-polling streams would like this kind of thing on the server. Basically anyone implementing event-push to web socket or long-polling web clients needs to distribute events and record who has received what. All that state-full book-keeping is a bother, and easy to write bugs into. Mix in an async implementation (which is the only solution to having any kind of scale with long-polling) and the bugs tend to multiply.
just dumping some notes here for later...
Another one from Tilton, Cells secret transcript. Great quote:
I like to decribe the role for Cells being in any situation where one has an unpredictable stream of data and one is keeping a model with a sufficiently large amount of internal state consistent with that stream of inputs.
This sounds just like what I want to use Javelin in Clojure for. Another good quote:
A fun note is that I have in the past applied Cells to a database, specifically the old AllegroStore persistent CLOS database. This works two ways. One is that a user can be looking at a screen and as the underlying data changes the screen changes. That may sound like old news but with Cells one doe not have to write any code to make it happen. One just says "this view shows this users overdue books" and when the date changes the overdue status on every book gets updated and a new book appears in the list on the screen if someone happens to be looking, simply by someone having written code to list overdue books on the screen as if it were an unchanging value.
So the idea is, you encapsulate a database read in a cell on the server, and whenever the database changes, the new value will propagate automatically to your cell. Now imagine hooking this up to a Castra endpoint, which connects on the client to another cell. If the database changes, wouldn't this value propagate all the way to the client? Unless I'm missing something, this sounds a lot like remote sync in other frameworks sans fancy caching, but with way less overhead and complication. Super cool.
It looks like this kind of cell abstraction, in some form or another, was used primarily for UI programming. E.g. Garnet, which predated even Tilton's Cells. However, the case of using cells on the backend is getting stronger the more I read.
More notes...
A big problem that Rich and Stuart bring up is transactional order. From the first link, a message from Rich:
if agent A notifies B and C, and B notifies C, then a change to A will send actions to B and C, and B will subsequently send to C, but A's message to C and B's message to C can arrive in any order. Once you are ok with that, you can do multithreaded reactive programming.
I guess core.async was created to get around this problem. I don't know if Javelin has this problem or not, but if it does, a cell abstraction built on top of core.async would be cool.
Alternatively, there is the FrTime language, which is mentioned in the second email thread.
ftp://ftp.cs.brown.edu/pub/techreports/03/cs03-20.pdf
A first scan of the papers seems to indicate that FrTime influenced Javelin, since paper 1 talks about a "lift" function and here we see a lift implemented in Javelin. I'll read FrTime more closely this weekend.
There is also an old dataflow api and source, and an SO thread.
@bsima Excellent stuff! I'll throw this in:
This is the paper and FRP library that most directly influenced Javelin: http://cs.brown.edu/~sk/Publications/Papers/Published/mgbcgbk-flapjax/paper.pdf lift
comes from Haskell and is the operation in several FRP frameworks that converts "plain" functions to functions over Behaviors/Signals. Behaviors and Signals in these frameworks correspond directly to cells. From the FRP perspective, Javelin is an FRP library with only Behaviors/Signals, and no event streams.
Yeah, the FlapJax paper cites the FrTime papers, and the FrTime author Gregory Cooper mentions it on his website and is an author on the FlapJax paper. So that's the basic historic path from Conal Elliot's invention circa 1997, all the way up to the current Javelin implementation.
But there are really two (or three?) historical lines; one for FRP (and/or "dataflow" which is a rather ambiguous term), and one for cells. They aren't really compatible, because cells update synchronously, whereas "dataflow" is typically concerned with managing concurrency, and FRP is concerned with managing asynchronous dataflow.
So we have 3 concepts that are being tangled together and confused. This really helps me understand why the old dataflow API was dropped, and the cell discussions killed, they all mention the problems of concurrency/asynchronicity without unpacking (decomplecting) the problem. Looking at these implementations from 6 years ago, they're rather naive, basically just simplified ports of Tilton's cells. Trying to build cells on top of agents is interesting as it attempts to make synchronous cells work with concurrent processing, but ultimately fails because there is no time constraint, so values could arrive at cells in any order -- cells are meant for sychronicity after all. So, the two relevant solutions to this concurrency problem are:
Of course CSP isn't actually reactive, it's just passing messages. If we want reactivity then we need FrTime/FlapJax. (Or it might be possible to implement FrTime-like semanatics using CSP, but that's an experiment for another time.)
The brilliance of Javelin is that it takes the FrTime semantics and wraps it in the cell abstraction, which is like painting a simple interface over the academic lingo of FRP. I think micha mentioned in his 20-minute talk a few weeks ago that Javelin cells are kinda alternative to core.async, but that's not strictly true. Maybe it is in ClojureScript where you don't have true concurrency, but really cells and CSP are complementary and solving two slightly different problems.
Okay, after all this long-winded thinking-out-loud, I'm concluding that Javelin would absolutely be useful in Clojure. I'm gonna finish reading the FrTime papers closely and also read the Javelin code again more closely, and then I'll work on experimenting with Javelin on the JVM next week or so.
I've got an in-progress branch here: bsima/javelin/jvm
I haven't worked much with custom Clojure data structures so this took quite a bit of research, but so far it's going well. I can construct a cell, but set-formula!
doesn't work and I'm not sure why yet. All I get is the extremely unhelpful java.lang.ClassCastException: clojure.lang.PersistentArrayMap cannot be cast to clojure.lang.IRef
when calling add-watch
on a Cell object.
The CLJS code is really dense and undocumented so it takes me a while to figure out exactly what it's doing. But a lot of the code is shared between CLJ and CLJS, so once I get the CLJ working we could consolidate using .cljc
My goal is to get this working by the end of the month. Thinking toward the future: I like the idea of specifying protocols for the cells to implement, because (as I said in the above analyses) the cell abstraction is useful outside of FRP, so perhaps an ICell (or similar) protocol could be used to implement cells with different backends - cells with core.async, or threaded cells, for example.
Really looking forward to this!
@bsima @sjmackenzie would you be interested in a code walkthrough via Hangout or similar? Javelin/cells come from a handful of ideas that code mashes together and obfuscates. I'll have some time this weekend, can probably enlist @micha also.
Ah the cells manifesto. That was my introduction to reactive programming. BTW @bsima the name is Tilton not Triton.
Also, I'd be interested in this hangout too. If you decide to go ahead with it, will you put a link in here?
@cddr will do, and will mention anyone else who expressed interest.
Yes indeed it would be quite cool to listen in to that talk.
@alandipert yes please! The Thanksgiving holiday and general work distracted me from working on this, but a code walkthrough would be super helpful and just generally interesting.
Saturday morning or early afternoon would be perfect, I'm on East Coast time.
@cddr oops! Not sure how I made that typo.. thanks :)
I'd love to join as well if you choose a good time. Edit: Oh it's weekend you're thinking of, that won't work for me. Have fun!
I'll likely be on a plane, but would love to review a recording of the walk through if possible.
Sent from my iPhone
On Dec 1, 2015, at 12:43 PM, Peter Nagy notifications@github.com wrote:
I'd love to join as well if you choose a good time.
— Reply to this email directly or view it on GitHub.
@xificurC @sjmackenzie @cddr @bsima I made a doodle, please select the time that works best for you: http://doodle.com/poll/z55yyt2cx6s5nuia
I'll try to set up a Hangout on Air and drop the link here when it's ready so we can save the video.
I have no idea what timezone the doodle times are using. Once decided please put the time + tz here and I'll convert.
@sjmackenzie timezone is EDT
For those joining us this weekend for the code walkthrough, http://cs.brown.edu/~sk/Publications/Papers/Published/mgbcgbk-flapjax/paper.pdf Section 3.4 is a concise description of what we'll be looking at. The rest of the paper also has a lot of useful context. Worth a read, and don't worry, you won't be tested :-)
I remember it was also mentioned that cells and data-flow in general is the most useful when it backs UI. Well, here is an example when it powers a JavaFX game UI which is written in Clojure: Game Development Development - Michael Nygard & Ragnar Svensson
OK the time is set, Sat 12/5/15 3:00 PM EDT. Here is the Hangout on Air link: https://plus.google.com/events/ccnkeqnkgfhv8933s3ff1l2obd0
Hey I'm here. Watching. But there's nothing I can type in the google video
:-)
@cddr change of plans, we're here now: https://unhangout.media.mit.edu/h/TurboComputingFortress
Just a heads up, I'm still working on this. Today reading about Elm's concurrent FRP ideas for possible use in training a neural network.
As for the Javelin code, I think it would be prudent to start with a few minor refactorings:
catch?
is in that highlighted code block twice)I could do all these refactorings over the next few weeks, and it would help me understand the codebase besides. Let me know what you think
I'm very heartened to see this discussion. I've been wondering if something like this might not be very useful for data analysis. When doing ETL, it's a bit of a bother in the REPL to go back and fix mistakes or assumptions and then re-execute everything (and error prone at that). I actually though of this the other day and couldn't remember if Javelin was ClojureScript only...and found this.
Hopefully you'll forgive the tangential nature of this comment but one thing I found really useful about Common Lisp when I was doing lots of ETL stuff was the debugger. It is beyond anything in any other language I've used. You could be near the end of an hour long ETL job, and some exception is thrown due to unexpected data. The debugger would allow you to navigate the stack trace to find the offending data, redefine the function that processes it to include a fix for the bug, then re-commence the job where it left off rather than having to start again.
Wow, this was a riveting read. Basically explains how to construct a "functional database" out of the same types of cells that Javelin uses
Has anyone here looked at potentially using propagators as proposed in the very-riveting talk by Gerry Sussman (of Scheme and SICP fame) titled "We Really Don't Know How to Compute!".
There's also a Clojure library implementation called Propaganda.
Looks like this aspect of the pulsar library is basically Javelin http://docs.paralleluniverse.co/pulsar/#dataflow-reactive-programming
Has there been any more progress on this?
I haven't been working on it unfortunately :(
Here's a good academic paper to hang our hat on: https://arxiv.org/abs/1406.2063v1
I'd like to use javelin for a JavaFX project, could anyone point me to the work in progress so that I can study it and possibly pick it up from where it was left? @bsima ?
@stathissideris I haven't been working on it, I'm using Kafka on the backend for dataflow-like programming. You might also want to look at pulsar as it has support for dataflow programming http://docs.paralleluniverse.co/pulsar/#dataflow-reactive-programming
@stathissideris You also might want to look at Onyx; You can use it for dataflow style programming, it has a very data-driven API, and you can execute your workflows in the browser, on the backend, or even distributed over a zookeeper cluster. The fact that it's built primarily for the latter most of these cases means that the setup is a little more involved than simple cell computing usage in the ballpark of javelin, but I'd bet one could wrap the functionality up in some macros that make it more buttery for this purpose.
@bsima @metasoarous Thanks for the suggestions! I like javelin's small footprint, so I think as a first step I'll do a spike to see how easy it would be to port properly to clojure. If I fail, I will investigate pulsar and onyx (which I thought couldn't run without zookeeper!)
Wow, hitting a bit of a wall because it of Cljs/Clj differences when it comes to deftype
. It seems that Javelin relies on the fact that you can set!
fields of existing instances of a deftype
, but in Clojure all the fields are immutable by default (there is a a way to override this):
(deftype foobar [field])
(def foo (foobar. 5))
(set! (.-field foo) 100)
Will throw:
IllegalAccessException Can not set final java.lang.Object field datacore.cells.foobar.field to java.lang.Long sun.reflect.UnsafeFieldAccessorImpl.throwFinalFieldIllegalAccessException (UnsafeFieldAccessorImpl.java:76)
This is a way to override this, but it's starting to show that I may need to take this to a slightly different direction for Clojure.
@stathissideris When I was working on it, I had to set ^volatileMutable
(or something) for each field. More problems came up when I was trying to get the tests to pass, some weird errors I could never figure out. My work if you're interested: https://github.com/bsima/javelin/tree/jvm
@bsima thanks for the link, I'll have a look. volatileMutable makes the fields of the type mutable but also removes any thread safety that you get with Clojure's "regular" data structures. It's very likely that the weird errors you were getting were due to race conditions or other similar problems.
6 months ago I made a lot of progress on the cljc branch.
I can't remember exactly how far I got, but I think... kinda far? I used refs for the mutable fields.
I ended up pretty disgusted by the way it came out. I think it's much better to have separate Clojure and ClojureScript namespaces, or perhaps even different Clojars artifacts.
So, the cljc work wouldn't be helpful as-is, but if you can figure out a way to spit out .clj from the .cljc, you'd have a decent starting point that's more up-to-date than the existing javelin_clj.clj
.
@alandipert thanks, I had a look at that branch and started uncommenting some of the tests to see how far you got (most of them did pass!). I have a question about dosync*
: it relies on binding
which modifies the bindings per-thread. This is not relevant to cljs (because it has one thread), but could it prove problematic for the JVM?
Self hosted javelin would be really awesome and also affects the cljc thing
@stathissideris oh yeah: that's one of the things I handwaved.
Reflecting on it now, I'm not sure Javelin needs to have its own dosync
in Clojure if refs are used.
Reasoning chain:
On ClojureScript, binding
is the way to establish a global dynamic scope. In Clojure the binding is scoped to the thread, as you point out.
ClojureScript doesn't have ref
or dosync
. There, binding
is thin veneer over setting a global variable and then setting it back, which is what we needed to make sure dosync
nested appropriately.
But in Clojure, dosync
is thread-global: even refs in different threads participate in the same transaction. I think it makes sense to apply to cells the same semantics that already apply to refs. At least, I can't think of a good reason for them to behave differently.
So I've been doing some reading and decided to have a go at implementing cells in Clojure from scratch. Here is my effort so far (be warned, this is work in progress!):
And here are some tests to show how it behaves:
This implementation is closer to how I think Kenny Tilton's rube works (although I've yet to read his actual code). Tilton is the original developer of cells in CL.
The main differences of my implementation to Javelin are:
Check the propagate function, it's very similar to Javelin's!
I'm planning to use this as part of my java-fx project and see if it's enough for my needs. For now, I'm glad that it's a simpler implementation than Javelin.
Any feedback welcome!
@stathissideris cool!
Re: 1, when you say that Javelin does static code analysis, are you talking about the cell=
macro? If so, it's worth noting that the cell=
macro doesn't know anything about the relationships between cells. It just generates code that will build the cell graph at runtime. It does leverage a few heuristics to minimize the number of cells that are created when the code does run.
Basically, it transforms expressions like (cell= (+ a b))
into ((formula +) a b)
, where formula
does all the work, like recognizing if a
or b
are cells, and linking them up if so.
If that differs from your perception about what cell=
does it would be helpful for me to know, because you won't have been the first to be misled. It's on my short list to clean up the Javelin README and de-emphasize cell=
. Actually, thank you in advance for any feedback you might have about the README. It's kind of... overparticular.
Re: 4, interesting choice! I can see the appeal of visually demarcating references to cells in formulas. Javelin gained a macro for doing something similar recently, formula-of
Anyway you're doing really cool work and I enjoy keeping up with it. Keep us posted :smile:
@stathissideris forgot to mention, clojure has an IAtom interface now, which you could extend instead of having your own swap!
@alandipert
Maybe static analysis is too grandiose a term, I meant the fancy code walking necessary for the hoisting of the args for cell=
. My implementation does not do that and as a consequence, the linking happens when a sink is first dereferenced, instead of when the cell is constructed (which is what happens in Javelin). I tried using my cells in practice and I was caught off guard a couple of times by this limitation, so it may be worth having some code walking after all.
About dosync
: A Clojure version of Javelin could use Clojure's dosync
but because transactions within dosync
can be retried, that would potentially mean that cell=
could no longer used reliably for side effects, despite the propagate function being smart enough to only visit each formula cell once, because if the transaction is tried more than once you'd get the same side effect more than once. I was thinking of a way around this, and I'm leaning towards having a 3rd type of cell (maybe called effect=
) which would be run after all the formula cell changes have been propagated, outside a transaction, and would guarantee at most one execution per swap!
.
I'm aware of IAtom
but I'm not sure yet whether I want my cells to look like atoms completely - still deciding on the interface! I implemented IRef
to get the convenience of typing @
.
Are there any plans or interest in using Javelin from Clojure? I was thinking earlier today that it would be nice to have cells on the backend too.
Yeah, core.async works for most of the same problems as Javelin, but the cell abstraction is often easier and simpler to use than channels.