rijs / fullstack

modern fullstack framework
243 stars 21 forks source link

Immutable Data #14

Closed pemrouz closed 9 years ago

pemrouz commented 9 years ago

TODO:

@mamapitufo @sammyt @mstade @gabrielmontagne @jkimbo - have you guys tried any of the immutable libs (mori, immutable-js, ancient oak, hamt, etc) or rolling your own? Do you have any thoughts/preferences/advice on which to use?

The only alternative to this I can think of is to store changes as diffs. The only issue with that is computing the state at a point in time is relatively more complicated, but we could cache the latest.

mstade commented 9 years ago

Right on. I have no real feedback on the question to be honest. The only one's I've tried are mori and immutable-js, and didn't much like either. It was a while ago though, and I both libs have likely changed quite a bit since so don't take my word for anything. (As usual, really.)

Freeze/thaw on change/retrieval to let consumers keep using vanilla types.

Meh. :o)

pemrouz commented 9 years ago

Lol. I'm not particularly keen on them either. I thought to do it myself based on [1][2](mostly for my own learning), but this would be quite a tangent and there's a lot else to do still..

[1] http://www.brainshave.com/talks/immutable-data-trees [2] http://hypirion.com/musings/understanding-persistent-vector-pt-1 (thanks for this @mere!)

Meh. :o)

Well, I thought to add a task to expose the internal persistent structures too, as with the above plan we'd only benefit from reduced memory consumption, but not performance savings. But a user could really use any immutable collection as the actual body depending on their app, so surely it's best to be unopinionated here about what the body type can be?

That line should read rather:

Freeze/thaw on change/retrieval to let consumers keep using any type they like (won't restricted to the internal implementation).

mamapitufo commented 9 years ago

Just a note on freeze: it only freezes at the top level. If one of your props is an object or array, you can still modify those. This makes it useless in my opinion.

I've been looking at immutable-js these days. I find its (and mori's) API a bit weird, but I guess that's just how it is when you implement these data structures as "oo" objects.

gabrielmontagne commented 9 years ago

yeah. we're trying to use immutable-js in our office as well. the APIs, well, until ES6 Proxies I guess they have to look like they do now. On 11 Feb 2015 10:01, "Alberto Brealey-Guzmán" notifications@github.com wrote:

Just a note on freeze: it only freezes at the top level. If one of your props is an object or array, you can still modify those. This makes it useless in my opinion.

I've been looking at immutable-js these days. I find its (and mori's) API a bit weird, but I guess that's just how it is when you implement these data structures as "oo" objects.

— Reply to this email directly or view it on GitHub https://github.com/pemrouz/ripple/issues/14#issuecomment-73857320.

jkimbo commented 9 years ago

I've used Immutable-js quite a bit and love it! It's not as fully featured as mori (and also not as big) but it approaches things from a javascript perspective rather than a clojurescript one which makes it nicer to work with.

The author had a pretty good talk on it at React Conf: https://www.youtube.com/watch?v=I7IdS-PbEgI

mstade commented 9 years ago

Just a note on freeze: it only freezes at the top level. If one of your props is an object or array, you can still modify those. This makes it useless in my opinion.

Totes.

Anyway, last time I used immutable-js there were a bunch of gotchas I didn't quite get. It was probably me being stupid, but things like merging maps never quite worked as advertised and I ended up doing weird shit to work around things. Like creating a regular ol' mutable map to represent the merged version, then making that immutable. It just put me off and I haven't quite had the inclination or need to go back and try again.

I probably will implement some immutable datastructures in funkis at some point. Not necessarily to compete with anything or because I have grandiose ideas on syntax and interface, but rather to better understand the internals of these kinds of things. I think it might be a useful exercise.

pemrouz commented 9 years ago

Thanks for the feedback all. So, now I'm personally leaning quite away from all these HAMT implementations, due to @mamapitufo's point, @mstade's gotchas and especially after watching @jkimbo's video of @leebyron dogmatically hating on Object.observe. Reactivity and immutability can work hand in hand. There is no replacement for the fact that things need to change and react to change. IMHO, the "unidirectionality" emphasised in Flux is a one-up on traditional approaches, but doesn't go far enough. The real enlightenment (thanks again D3) is achieving "data-flow" (vs control-flow) which is a stricter subset of unidirectionality. This inversion of control is much simpler and reduces another layer of complexity (namely, manually dispatching events):

todos.push(todo)

You can see in the Flux example:

todos.push(todo)
SomeSpecialProprietaryFrameworkConcept.emitChange() // emitChange? business as usual..

But this is besides the main point. I don't at present see a way to benefit from the structural sharing of existing implementations in this paradigm, unless we (a) force users to use a particular immutable type or (b) translate and replay the vanilla changes onto a specific immutable type. But if I was to do the latter, I might as well just store the versioned history as minimal diffs git-style. From my limited understanding, this would achieve a better memory footprint than those HAMT implementations (since they could result in several nodes changing), but the performance for previous lookups would be worse (adding up diffs). This is fine though, as memory is the main concern (will always be storing history) and performance of looking up previous versions should be rare (time travel would only be in debugging) - plus we can cache HEAD and HEAD~1, etc. Before I embark down this experimental path, does anyone think this would definitely be a bad idea and I should stick to immutable-js?

@gabrielmontagne, great point on ES6 Proxies. Since Ripple requires the --harmony flag, this is a viable option too.

mamapitufo commented 9 years ago

As far as I know the memory impact of persistent data structures shouldn't be a concern. If you are storing history (for undo, for example), the few examples I've seen are actually orders of magnitude better than the simple alternative (cloning the data structs).

The point about forcing your users to use a particular library for immutable data is a problem you won't be able to escape just yet in JS. In that case I'd rather force them to use one of the best known/popular/supported libraries.

I would think that implementing this diffs and re-applying them every time would quickly become very slow. After all, you only have a few ms every frame to push your changes through. Would you consolidate the changes every once in a while? I guess something like vide encoding, where you have keyframes every n frames that have the full set of infor at that point in time, and diffs for the frames between those. I still think this would be too slow, keeping in mind that you will be using these data structs in a real application that will be doing things at the same time.

Anyway, I'm realising that I don't have the full picture on what you want to do, so I'll go and take a look and think about it :)

leebyron commented 9 years ago

So, now I'm personally leaning quite away from all these HAMT implementations, due to ... video of @leebyron dogmatically hating on Object.observe.

I'm happy you watched the video, but I'm sorry you perceived it as dogmatic or hating. Neither were my intent. For those who watched my talk and felt similar, I apologize.

I'm actually a fan of Object.observe, I just think there is often unbridled optimism around it as a panacea that I wanted to challenge. Tracking changes to your application state is a Hard Problem™ and ultimately a game of tradeoffs. There exists no perfect solution and finding the best solution for your application requires understanding all of the available tools at your disposal.

In that talk, I'm actually trying to dissolve some dogma around immutable data. There's a general belief that immutable data is academically interesting but not useful for real world interactive programs because of memory use or time concerns. I wanted to illustrate that this belief is often not correct, that using these tools can help us solve performance problems rather than create them.

I really hope that you don't dismiss HAMT and it's cousin data structures just because you didn't like my talk. They're really powerful tools to have in your problem solving arsenal.


Some other notes from what I read here:

todos.push(todo)
SomeSpecialProprietaryFrameworkConcept.emitChange() // emitChange? business as usual..

This is just the EventEmitter pattern. It is in fact business as usual when you want a one to many communication. Object.observe takes just a single callback, but if you wanted to broadcast the changes to your object to multiple clients, you might want to use the EventEmitter pattern to do that.

unless we (a) force users to use a particular immutable type

I definitely agree with you and others in this thread that it's generally not a great idea to enforce using one particular API for data. If you enforce mori or immutable-js and people would rather use something else, then you've surely not done anyone a favor. IMHO, it's a better approach to let people use whatever source data works best for them and to position this library such that it can work well with a variety of data sources.

I might as well just store the versioned history as minimal diffs git-style. From my limited understanding, this would achieve a better memory footprint than those HAMT implementations.

You might be surprised by this. Either way, I encourage you to experiment and choose a path based on real results.

leebyron commented 9 years ago

Also, another well presented example of time travel with HAMT and it's impact on memory usage by David Nolen here http://youtu.be/SiFwRtCnxv4?t=18m11s

Again, YMMV - so choose based on real data for your application, but definitely don't dismiss this tool as an option.

pemrouz commented 9 years ago

@leebyron, thanks for the clarification! I enjoyed the talk overall, and am with you on the benefits of immutability but mistakenly thought you were dismissing the use of Object.observe altogether. As you said, this is mostly about trade-offs and being open to using all available tools.

Regarding the EventEmitter pattern, I was contending that O.o is a rival solution there, and that is actually an orthogonal issue to whether the data is immutable/mutable. The native O.o function takes only one callback as you note, but that one callback can broadcast changes to others that need it (the approach taken in this library). This is the same as the EventEmitter pattern that invokes only one function, but also subsequently broadcasts it to others. The liberating difference, however, is that the former closes the gap between the "change" and "emit change", which should ideally be an epiphenomenon.

I definitely will give the existing libraries a shot (just need to work out best way to freeze/thaw), as well as experimenting with HAMT and other solutions (diffing), in each case sticking to the design goal of trying to keep the API generic. Instrumentation for measuring time performance is straightforward, but does anyone know an accurate (programmatic) way to capture memory usage?

Thanks for the Nolen video, will check it out this weekend! :)

pemrouz commented 9 years ago

@mamapitufo. This is what a resource currently looks like (name, body and some metadata)

{
  name: 'tweets'
, body: []
, headers { .. } 
}

By adding a new property (history?), we can store the diffs on change:

{
  name: 'tweets'
, body: []
, headers { .. } 
, history: []
}

Since body will always refer to the latest value, we won't need to compute anything (unless we want to go back in time). This is where a HAMT would definitely win though over the git approach, since previous versions are stored (efficiently via structural sharing) as absolutes and so require no computation.

pemrouz commented 9 years ago

@immutable-js (and mori) dons: Could someone confirm whether there would be any structural sharing going on between c and d, or f and g?

a = {id: 1}
b = {id: 2}
c = Immutable.List([a])
d = Immutable.List([a,b])
a = {id: 1}
b = {id: 2}
e = [a]
f = Immutable.List(e)
e.push(b)
g = Immutable.List(e)

I'm pretty sure not (since even without the mutation, immutable(a) !== immutable(a)), so main question: is there anyway to share internal structures from two different instantiations? I'm trying to explore if it's possible to marry the referential transparency of JSON.stringify with a memory efficient structure..

That is, if immutable = JSON.stringify, then the following would hold true immutable(a) == immutable(a)..

leebyron commented 9 years ago

Unfortunately there is not. Doing so would require some sort of global cache of objects which would have very detrimental performance effects.

However, you can and should use value equality to determine if two immutable structures are equal to one another. Immutable.is() is the function for doing that.

Value equality is also how JSON.stringify(a) == JSON.stringify(a). These two operations return two different strings! That is, two different regions of memory. However Javascript knows that when comparing strings with == to use value equality, not reference equality.

pemrouz commented 9 years ago

Right, done! Turns out it was much easier than expected to integrate immutable-js with Ripple to internally get an efficient versioned history for each resource, and the entire application state - whilst letting users use vanilla data structures :smile:. I think we have the best of both worlds of mutability/immutability now. You can checkout the obligatory TodoMVC example with an actual <timetravel-debugger> that allows you to see previous versions and rollback to them here as well as a few more details on the simple API.

image

Thanks for the comments all! :)