Open rreusser opened 9 years ago
@mikolalysenko — Just saw this: ndarray-matrix-vector-multiply. It's obviously more sophisticated than mine, but I'm trying to understand the nature of the optimizations. I relied on cwise which I assumed—perhaps negligently so—was doing the important part of unrolling the getters and setters. Or maybe it's being more intelligent/streamlined about deduping tracing through strides and offsets between the row-column products.
Ah, or is it that it's smart enough to know that the order of operations should be flipped if the matrix is transposed…
Yeah, that is pretty much the idea. The goal is to use whatever the fastest traversal order is for the given input data.
Regarding merging them, I'll leave it up to your judgement. For what it is worth, I think this module has a better name.
Great. How about this: I'll pull your work into this module and keep on going with the blas naming conventions.
Sounds fine by me! Eventually we can move the old matrix-vector code into a junk drawer.
I think the final API won't be so different so it makes sense to me to just consider this a five-liner for now so that it can be used, but this should be done in blocks, probably with the metaprogramming approach similar to ndgemm.