mikera / core.matrix

core.matrix : Multi-dimensional array programming API for Clojure
701 stars 113 forks source link

reshape performance very bad with large double arrays #299

Open cnuernber opened 7 years ago

cnuernber commented 7 years ago
(defn reshape-time-test
  (let [n-rows 100
        n-cols 1000
        src-array (double-array (* n-rows n-cols))]
    (println "reshape time")
    (time (dotimes [idx 100]
            (m/reshape src-array [n-rows n-cols])))
    (println "c-for time")
    (time (dotimes [idx 100]
            (let [^"[[D" dest (make-array Double/TYPE n-rows n-cols)]
              (c-for [row 0 (< row n-rows) (inc row)]
                     (java.lang.System/arraycopy src-array (* row n-cols) (get dest row) 0 n-cols)))))))

reshape time
"Elapsed time: 174760.275438 msecs"
c-for time
"Elapsed time: 19.301593 msecs"
cnuernber commented 7 years ago

For sanity's sake you may want to try with counts of 10 instead of 100.

I researched this a bit and I found the source likely two things:

First, aset-double is doing reflection ... so that is in core.clj of clojure itself. Second, (mp/get-2d data i j) I believe is doing nth on an array which apparently is quite slow.

I am running into this importing vgg16 into cortex from keras.

mikera commented 7 years ago

So the problem here is fundamentally that we don't yet have a reshape operation for the :double-array implementation. Hence it is falling back to a default implementation, which certainly isn't optimised for the double-array case.

I'll take a look and see if I can optimise this at all.

In the meantime, the obvious solution is just to use an implementation that plays nicely with Java double arrays:

(defn reshape-time-test
     (let [n-rows 100
           n-cols 1000
           src-array (double-array (* n-rows n-cols))]
       (println "reshape time")
       (time (dotimes [idx 100]
               (m/reshape src-array [n-rows n-cols])))
       (println "vectorz time")
       (time (dotimes [idx 100]
               (m/reshape (array :vectorz src-array) [n-rows n-cols])))))
=> (reshape-time-test)
reshape time
"Elapsed time: 294872.994923 msecs"
vectorz time
"Elapsed time: 49.254672 msecs"