weavejester / medley

A lightweight library of useful Clojure functions
Eclipse Public License 1.0
865 stars 66 forks source link

Function request(s): Add/Remove element at index in vector #32

Closed orestis closed 5 years ago

orestis commented 5 years ago

When working with UIs that need specific order, a vector makes sense.

But also the ability to remove or insert entries anywhere in the vector, which is also important for UI work. Arguably, an RRB vector might be more efficient, but for few entries (no more than 10) a vector should be just as fine, if not faster.

Something like (taken literally from the top stack overflow answers):


(defn vec-insert
  [v n e] 
  (vec (concat (subvec v 0 n) [e] (subvec v n))))

(defn vec-remove
  [v n]
  (vec (concat (subvec v 0 n) (subvec v (inc n)))))
weavejester commented 5 years ago

That's not a bad idea, though I don't think there's much benefit to limiting the functions to a vector. Instead, perhaps something like:

(defn insert-at [index element coll]
  (lazy-seq
    (if (zero? index)
      (cons element coll)
      (when (seq coll)
        (cons (first coll) (insert-at (dec index) element (rest coll)))))))

We could also add a transducer version, so that we could write either:

(vec (insert-at 3 :foo (range 10))

Or:

(into [] (insert-at 3 :foo) (range 10))
cloojure commented 5 years ago

FWIW, there are similar functions in tupelo.core: drop-at, replace-at, and insert-at. For maximum portability & simplicity, these just use take and drop, then return a vector result via into.

weavejester commented 5 years ago

drop-at is an interesting alternative name for remove-at, but I think I favour the latter. Both drop and remove remove elements from a sequence, but drop will always remove from the beginning of the sequence, while remove can remove from any part. I think remove-at works better with insert-at as well.

A replace-at function would be interesting as well, though obviously not important for vectors, where assoc can be used instead.

Medley is written to be performant, so I'll avoid using take and drop in this case.

orestis commented 5 years ago

Thanks for considering this!

If performance is a goal, isn’t subvec the best option when the collection is a vector?

weavejester commented 5 years ago

Not in this case. If you just want a slice of a vector, then subvec has O(1) performance. This is because it returns a SubVector instance that references the original vector, providing a window onto it.

But once you have the subvector, then what? If you concat, as in your earlier example, you're converting the subvector into a seq and iterating through it. It's an O(n) operation with additional steps. We can see this if we compare the performance of the vec-insert function you posted, vs. insert-at:

user=> (def v (vec (range 1000)))
#'user/v

user=> (quick-bench (vec-insert v 50 :foo))
Execution time mean : 105.214252 µs 

user=> (quick-bench (vec (insert-at 50 :foo v)))
Execution time mean : 30.585013 µs

We could also try using subvecs and into, avoiding the cost of turning it into a seq:

(defn vec-insert2 [v n e] 
  (-> (subvec v 0 n) (conj e) (into (subvec v n))))

But even this is slower, perhaps because subvectors can't be directly converted to transients:

user=> (quick-bench (vec-insert2 v 50 :foo))
Execution time mean : 95.581930 µs

If we introduce a transducer, this can cut down the time a little further, by eliminating the need for an intermediate seq:

(defn insert-at-t [index element]
  (fn [rf]
    (let [idx (volatile! (inc index))]
      (fn
        ([] (rf))
        ([result]
         (if (= @idx 1)
           (rf (rf result element))
           (rf result)))
        ([result x]
         (if (zero? (vswap! idx dec))
           (rf (rf result element) x)
           (rf result x)))))))

This shaves off a further 20%:

user=> (quick-bench (into [] (insert-at-t 50 :foo) v))
Execution time mean : 23.405775 µs

Interestingly, Tupelo's implementation was initially more performant than I expected it to be, considering it uses take and drop to create two additional seqs:

user=> (quick-bench (t/insert-at v 50 :foo))
Execution time mean : 24.871629 µs

But further testing revealed Tupelo's implementation has performance dependent on the index, as the greater the index, the more work take and drop need to do:

user=> (quick-bench (t/insert-at v 0 :foo))
Execution time mean : 19.720011 µs
user=> (quick-bench (t/insert-at v 500 :foo))
Execution time mean : 69.123874 µs
user=> (quick-bench (t/insert-at v 1000 :foo))
Execution time mean : 118.711424 µs

Whereas a transducer's time is fixed:

user=> (quick-bench (into [] (insert-at-t 0 :foo) v))
Execution time mean : 23.263353 µs
user=> (quick-bench (into [] (insert-at-t 500 :foo) v))
Execution time mean : 22.994005 µs
user=> (quick-bench (into [] (insert-at-t 1000 :foo) v))
Execution time mean : 23.730177 µs
orestis commented 5 years ago

Thanks for the detailed explanation! I’ll see if I can make some comparisons for smaller vectors and also on browsers.