scijs / ndarray-blas-level2

BLAS Level 2 operations for ndarrays
MIT License
9 stars 2 forks source link

Add validations and optimized versions #11

Open rreusser opened 8 years ago

rreusser commented 8 years ago

Proposed structure (using gemv as example):

rreusser commented 8 years ago

@tab58 - one big potential problem to start thinking about is optimizing for strides. Because a transpose is symbolic, we can't count on data being row- or column-major, so for some things we may need to optimize for both. I believe ndarray has a list of dimensions ordered somehow to help account for this. Or at least we can think about the proper guidelines and caveat emptor.

tab58 commented 8 years ago

Yeah, this does present a problem. I was also thinking about the transpose operation and trying to do something like y = a(A^T)x+by and for an m x n matrix you can't just substitute A.transpose into any of these functions without it coming out wrong. Maybe we'll have to add transpose flags into the functions.

tab58 commented 8 years ago

About data storage, I would want to establish a row-major or column-major convention but the problems with that as I see it are:

I guess we could just pick one and hope it's the right one :smile:

tab58 commented 8 years ago

After further thinking about it, I propose we need to stop using the .get() and .set() methods altogether. The methods cause problems when you have a 2D array that's actually an (n x 1) vector. I discovered this when I tried to use blas2.gemv() with (n x 1) matrices in place of the column vectors. x.get(i) on 2D matrices requires that second index and it breaks if it's undefined, but since all of BLAS1 Level 1 calculates the index and uses that to pull the data it never broke.

My $0.02 anyway.

rreusser commented 8 years ago

To play the devil's advocate, an advantage of get and set is that you can do things like multiplying by a large matrix without actually constructing the matrix, but, as Mikola has pointed out, that's not actually all that useful without sparse operations.

I've been looking for the last couple days at emlapack. The thought of just plugging into all of this eigenvalue stuff is really attractive. From what I can tell, here are some thoughts:

I think it'd be possible to boil it down to a fairly simple use of SVD/eigenvalue algorithms to name a couple. Working with emscripten is a little cumbersome, but I feel like the result could be an order of magnitude more reliable than rolling our own implementations of these algorithms. There are so many pitfalls when you get into the subtleties of iterative algorithms…

tab58 commented 8 years ago

I just think that for this library, matrix-vector math would be served so much better without .get() and .set(). Not to put my personal selfish interests above the greater good, but I'm using these libraries in an optimization library and that's how I found out about how it broke. "Eat your own dog food", amirite? Besides, we don't need to use .get() and .set(). That's for only for the users, right? 😄

Also, I think a separate sparse library for general matrix operations is better for sparse optimizations, but that's just my $0.02.

About emlapack, no question there that directly using LAPACK would be so much better than rolling our own. The translation time alone is enough to make me pause. I remember seeing all those GOTOs and thinking about how we'd need to refactor the hell out of the functions. I'd like to start a separate discussion on emlapack.

rreusser commented 8 years ago

Yeah, pretty much agree with all of that 👍

Does that mean a wrapper for emlapack is what we're really after then?

tab58 commented 8 years ago

Yeah I think so. Although I think we should keep developing BLAS routines in JS, I think a great project would be nice emlapack wrapper using ndarrays.

Have you tried running emlapack? Does it function well?

rreusser commented 8 years ago

Got it up and running, but haven't done a whole lot with it yet. Main focus was on extracting dependencies from the code so that the bundle doesn't end up at 5MB or something. It's not that hard. Been a little busy but will see if I can get that fully working this afternoon.

The main project is going to be reasonably intelligent wrapper functions so that you don't need to worry about allocating and passing simple constants.