Closed orestis closed 5 years ago
That's not a bad idea, though I don't think there's much benefit to limiting the functions to a vector. Instead, perhaps something like:
(defn insert-at [index element coll]
(lazy-seq
(if (zero? index)
(cons element coll)
(when (seq coll)
(cons (first coll) (insert-at (dec index) element (rest coll)))))))
We could also add a transducer version, so that we could write either:
(vec (insert-at 3 :foo (range 10))
Or:
(into [] (insert-at 3 :foo) (range 10))
FWIW, there are similar functions in tupelo.core: drop-at, replace-at, and insert-at. For maximum portability & simplicity, these just use take
and drop
, then return a vector result via into
.
drop-at
is an interesting alternative name for remove-at
, but I think I favour the latter. Both drop
and remove
remove elements from a sequence, but drop
will always remove from the beginning of the sequence, while remove
can remove from any part. I think remove-at
works better with insert-at
as well.
A replace-at
function would be interesting as well, though obviously not important for vectors, where assoc
can be used instead.
Medley is written to be performant, so I'll avoid using take
and drop
in this case.
Thanks for considering this!
If performance is a goal, isn’t subvec the best option when the collection is a vector?
Not in this case. If you just want a slice of a vector, then subvec
has O(1) performance. This is because it returns a SubVector
instance that references the original vector, providing a window onto it.
But once you have the subvector, then what? If you concat
, as in your earlier example, you're converting the subvector into a seq and iterating through it. It's an O(n) operation with additional steps. We can see this if we compare the performance of the vec-insert
function you posted, vs. insert-at
:
user=> (def v (vec (range 1000)))
#'user/v
user=> (quick-bench (vec-insert v 50 :foo))
Execution time mean : 105.214252 µs
user=> (quick-bench (vec (insert-at 50 :foo v)))
Execution time mean : 30.585013 µs
We could also try using subvecs and into
, avoiding the cost of turning it into a seq:
(defn vec-insert2 [v n e]
(-> (subvec v 0 n) (conj e) (into (subvec v n))))
But even this is slower, perhaps because subvectors can't be directly converted to transients:
user=> (quick-bench (vec-insert2 v 50 :foo))
Execution time mean : 95.581930 µs
If we introduce a transducer, this can cut down the time a little further, by eliminating the need for an intermediate seq:
(defn insert-at-t [index element]
(fn [rf]
(let [idx (volatile! (inc index))]
(fn
([] (rf))
([result]
(if (= @idx 1)
(rf (rf result element))
(rf result)))
([result x]
(if (zero? (vswap! idx dec))
(rf (rf result element) x)
(rf result x)))))))
This shaves off a further 20%:
user=> (quick-bench (into [] (insert-at-t 50 :foo) v))
Execution time mean : 23.405775 µs
Interestingly, Tupelo's implementation was initially more performant than I expected it to be, considering it uses take
and drop
to create two additional seqs:
user=> (quick-bench (t/insert-at v 50 :foo))
Execution time mean : 24.871629 µs
But further testing revealed Tupelo's implementation has performance dependent on the index, as the greater the index, the more work take
and drop
need to do:
user=> (quick-bench (t/insert-at v 0 :foo))
Execution time mean : 19.720011 µs
user=> (quick-bench (t/insert-at v 500 :foo))
Execution time mean : 69.123874 µs
user=> (quick-bench (t/insert-at v 1000 :foo))
Execution time mean : 118.711424 µs
Whereas a transducer's time is fixed:
user=> (quick-bench (into [] (insert-at-t 0 :foo) v))
Execution time mean : 23.263353 µs
user=> (quick-bench (into [] (insert-at-t 500 :foo) v))
Execution time mean : 22.994005 µs
user=> (quick-bench (into [] (insert-at-t 1000 :foo) v))
Execution time mean : 23.730177 µs
Thanks for the detailed explanation! I’ll see if I can make some comparisons for smaller vectors and also on browsers.
When working with UIs that need specific order, a vector makes sense.
But also the ability to remove or insert entries anywhere in the vector, which is also important for UI work. Arguably, an RRB vector might be more efficient, but for few entries (no more than 10) a vector should be just as fine, if not faster.
Something like (taken literally from the top stack overflow answers):