scijs / ndarray-blas-gemv

BLAS Level 2 GEMV (Matrix-Vector multiply) for ndarrays
6 stars 1 forks source link

Optimize #1

Open rreusser opened 9 years ago

rreusser commented 9 years ago

I think the final API won't be so different so it makes sense to me to just consider this a five-liner for now so that it can be used, but this should be done in blocks, probably with the metaprogramming approach similar to ndgemm.

rreusser commented 9 years ago

@mikolalysenko — Just saw this: ndarray-matrix-vector-multiply. It's obviously more sophisticated than mine, but I'm trying to understand the nature of the optimizations. I relied on cwise which I assumed—perhaps negligently so—was doing the important part of unrolling the getters and setters. Or maybe it's being more intelligent/streamlined about deduping tracing through strides and offsets between the row-column products.

rreusser commented 9 years ago

Ah, or is it that it's smart enough to know that the order of operations should be flipped if the matrix is transposed…

mikolalysenko commented 9 years ago

Yeah, that is pretty much the idea. The goal is to use whatever the fastest traversal order is for the given input data.

mikolalysenko commented 9 years ago

Regarding merging them, I'll leave it up to your judgement. For what it is worth, I think this module has a better name.

rreusser commented 9 years ago

Great. How about this: I'll pull your work into this module and keep on going with the blas naming conventions.

mikolalysenko commented 9 years ago

Sounds fine by me! Eventually we can move the old matrix-vector code into a junk drawer.