No benchmark against numpy

zendevil commented 3 years ago

On my Mac, numpy runs 1-2 orders of magnitude faster than neanderthal on native-double factory. Why is that? Isn't it supposed to run faster than numpy?

zendevil commented 3 years ago

Here's the neanderthal code:

(time
 (let [
       N 1000
       p (ge fac N 1 (range N) {:layout :row})
       p-t (trans p)
       Q (ge fac N N (range (* N N)) 
             {:layout :row})
       res0 (ge fac N 1 {:layout :row})
       res1 (ge fac 1 1 {:layout :row})]
   (mm! 1.0 Q p res0)
   (mm! 1.0 p-t res0 res1)
   ))

and here's the numpy code:

import time
import numpy as np
t = time.time()
a = np.random.random_sample((1000,1000)) 
b = np.random.random_sample((1000, 1)) 
c = np.transpose(b) 
print(np.matmul(np.matmul(a, b) , c), (time.time() - t) * 1000, "ms")

While, not exactly identical (numpy is using random for matrix generation whereas clojure is using range), the performance difference is not only massive, but also counterintuitive. How does one reconcile this and that you mentioning that deep diamond (which uses neanderthal) is faster than tensorflow (which uses numpy)?

blueberry commented 3 years ago

Please read the Neanderthal documentation. You're doing many mallocs in Java, which of course is: 1) slow 2) unnecessary

You're working with 1x1 and 1000x1 matrices which is 1) inefficient 2) not optimal to the problem

You're creating lazy Clojure sequences, transferring them to Neanderthal, and including that in your measurements. Of course that's slow.

Please read the documentation before you open issues.

zendevil commented 3 years ago

I've timed just the mm! operation, which doesn't concern itself with java mallocs, and likewise timed a single matmul operation in numpy.

(let-release [
       N-inc 5
       tau 2
       c (range N-inc)
       p (ge fac N-inc 1 (range N-inc) {:layout :row})
       p-t (trans p)
       Q-mat (Q (dec N-inc) tau c)
       res0 (ge fac N-inc 1 {:layout :row})
       res1 (ge fac 1 1 {:layout :row})]
   (prn "Q is " Q-mat)
   (time (mm! 1.0 Q-mat p res0))
   (mm! 1.0 p-t res0 res1)
   )

"Elapsed time: 0.079535 msecs"

On the other hand

import time
import numpy as np

a = np.random.random_sample((1000,1000)) 
b = np.random.random_sample((1000, 1)) 
c = np.transpose(b) 
t = time.time()
e = np.matmul(a, b) 
print("first matmul time", time.time() - t)
d = np.matmul(e, c)
print(d, (time.time() - t) * 1000, "ms")

first matmul time 0.0011227130889892578

Numpy is faster for a single matrix multiplication operation. No jvm involved in this operation. Numpy is 50 times faster than native-double factory neanderthal for the mm! operation. However, you mention in a blog post that neanderthal is faster than tensorflow, which is based on numpy. What am I doing wrong in the mm! operation that's making it slower than numpy then? You mention that it's not optimal to the problem. Why is it not optimal to the problem? What other substitute is there for multiplying matrices?

blueberry commented 3 years ago

OK, if you say so. But, please, stop spamming the issues with things that are not issues.

zendevil commented 3 years ago

Ok I'll think twice before posting an issue. Do you consider this an issue to be addressed? Can I ever expect Neanderthal native to perform better than numpy or not?

uncomplicate / neanderthal

No benchmark against numpy #107