szymonrw / ancient-oak

Immutable data trees in JavaScript.
MIT License
223 stars 11 forks source link

Help: Implementing `tail` atop ancient-oak #9

Open jasonkuhrt opened 10 years ago

jasonkuhrt commented 10 years ago

Hey @brainshave, would like your thought regarding implementing the tail function atop ancient-oak. In vanilla JavaScript it would look like:

[1,2,3].slice(1)
// [2,3]

In ancient-oak:

Nope:
oak([1,2,3]).rm(0).dump()
// [ , 2, 3 ]
Cute, but there should be a more performant way:
oak([1,2,3]).reduce(function(acc, x, i){ return i === 0 ? acc : acc.push(x) }, oak([])).dump()
// [ 2, 3 ]
jasonkuhrt commented 10 years ago

@brainshave What is the reasoning for this result?

oak([1,2,3]).rm(0).dump()
// [ , 2, 3 ]
szymonrw commented 10 years ago

It's because rm simulates delete, not splice nor slice, so the size of array never changes with one exception when removing the last element.

I think that the exception clause should go away to keep it consistent.

Slice and splice require separate implementations and it's complicated (but very interesting from algorithms perspective) to make it efficient.

jasonkuhrt commented 10 years ago

@brainshave curious, I assume Clojure has said algorithms? Does it fall under the persistent-vector category or not at all? e.g. as discussed in http://hypirion.com/musings/understanding-persistent-vector-pt-1. I took a cursory read through all 3 articles but saw no mention of this scenario

szymonrw commented 10 years ago

The quick answer would be no, they don't have specific optimisations. General case is solved by take and drop functions which are generic and work with any sequential data.

The longer answer would be subvec which creates a new "view" of a vector it's called on. This is very much like subarray method of JavaScript's native arrays (only that these are mutable).

The "view" of the vector/array remembers the start, end and reference to the original array. It only allows access within the new range. It's a very quick, constant-time operation.

There are two problems with Clojure's subvec if we want to use it as a substitute for slice:

  1. Leaking memory because the subarray still holds reference to the whole original array but only allows access to part of it. Even if we no longer have a reference to the original array, elements that are out of the subarray's scope won't be garbage collected until the "view" is garbage collected.
  2. The start offset will accumulate if we derive more subarrays from other subarrays. Imagine we're slicing off 1000 elements on the left multiple times. Each time start offset increases by 1000 so we're also cutting off a span of 1000 addresses and there's no way to revert it, the start offset only accumulates.

This is why I was thinking about something more complicated: slice would create a new version of the original array with:

This way we get rid of all inaccessible references. The shifting part is there to limit accumulation of start offset. Because we only allow to have empty space in left-most leaf, the maximum start offset is only 31 (because leaf size is 32).

This should be still faster than iterating over every element and creating completely new array, especially in common cases like shift/tail (removing one element from left), pop (from right) and take (n first elements) when the node-shifting stage would be omitted. To have something O(1) we could also provide subvec equivalent, probably called view but with clear indication of its issues in docs.

I don't know where to even begin with splice ;-)

szymonrw commented 10 years ago

Version 0.3.0 has now a simple implementation of slice

https://github.com/brainshave/ancient-oak/blob/766a49d491f21fd7aef15e3293e3d3e231142663/lib/types/array.js#L53-63

jasonkuhrt commented 10 years ago

@brainshave awesome, I need to review all this soon. Looking forward to it. Under deadlines right now : )

szymonrw commented 10 years ago

@jasonkuhrt oh, btw. when I said "simple" I meant "naive" ;-) . There's no taste of the aforementioned algorithm in there yet.

jasonkuhrt commented 10 years ago

@brainshave FYI YMMV http://elm-lang.org/blog/announce/0.12.1.elm. They implemented persistent data structures using ideas presented in http://www.infoq.com/presentations/julia-vectors-maps-sets.

Consequently (?) they have a slice function in standard library array: http://library.elm-lang.org/catalog/elm-lang-Elm/0.12.3/Array which could be trivially used to tail.

I haven't actually dug in myself yet, but you might be able to quickly compare your work with what they're doing and if Zach (Julia lang) did actually advance upon Clojure's implementation etc.