Common matrix interface?

tel / clatrix

A stupid name for a smart matrix library, because who doesn't love smart matrices?

MIT License

154 stars 29 forks source link

Common matrix interface? #7

Closed mikera closed 11 years ago

mikera commented 11 years ago

Hi there, this is somewhat of a general roadmap question for matrix maths libraries in Clojure.

I have my own matrix wrapper for the vectorz vector / matrix library (which is pure Java, i.e. no native dependencies):

https://github.com/mikera/vectorz-clj

I need the vectorz library for some specific features which aren't in BLAS. However, I'd like to avoid reinventing wheels if possible, so is it likely to be feasible to create a common interface for clojure matrix libraries, supporting multiple matrix implementations?

If so, could clatrix take on that role?

mjwillson commented 11 years ago

Can't speak for the author but I'd be interested in a project to come up with a standard Clojure API for vectorized array maths which could support different (or even a mix) of implementation libraries under the hood. Something aiming to become a lingua franca base layer for numerics code in Clojure, rather like numpy is in the Python world.

A numpy-style n-dimensional array interface would be a good abstraction to base it on I think, since it generalises nicely beyond linear-algebra-specific abstractions (vector as just a 1-dimensional array, matrix as 2-dimensional array etc) and allows for a nice consistent semantics to the way operations 'broadcast' over arrays of compatible shapes. Numpy's array stride support is nice too -- allows operations like transpose/slice/submatrix etc on dense arrays to be performed without copying data (rather like your 'views' in vectorz?).

I'd probably be concerned with getting the basic API right before heavily optimising, although would definitely want to allow for different back-end libraries to be used to speed up specific operations, or to supply special-purpose optimised implementations, e.g. for sparse arrays. Imagine there'd need to be some pure Java routines to support the general case, with call-outs to e.g. JBLAS for dense matrix multiplications etc. Could be some challenging compatibility issues in dealing efficiently with libraries which call into native code.

Any thoughts?

Perhaps if there are other UK folk interested we could drum up enough interest for a Clojure numerics hack day?

mjwillson commented 11 years ago

Some discussion on related roadmap ideas here as well: https://github.com/tel/clatrix/issues/2

tel commented 11 years ago

Sorry for the lack of responses and pulls.

With respect to general roadmap issues, I'm no longer in my program and thus am no longer actively using Clojure or matrix libraries. To that end, I can't pretend I'd provide great leadership for where this project should go. I'd be more than happy to give maintainership to an active and interested party, especially if someone feels confident taking the reign on building Clatrix as common high level matrix API over the various Java/Clojure matrix implementations.

mikera commented 11 years ago

Happy to get involved in creating / maintaining such a library / abstraction layer if there is enough interest. Specifically I'd be happy to:

Work jointly on the common interface / abstraction layer API
Maintain the vectorz mappings / implementation

I think the necessary extensibility could be achieved with protocols, i.e. it should be possible to specifiy the basic matrix/vector operations in Clojure protocols, and then extend these protocols to different implementations. Default implementations could be provided in some cases as well.

Once the basic matrix operations are provided as protocols, then any other functionality can be built on top of them.

In order for this to be generally useful, I think we would need to get at least a reasonable majority of the matrix / vector projects on board. Right now I'm aware of:

Incanter
Clatrix
vectorz / vectorz-clj

Anyone aware of any others?

ejackson commented 11 years ago

I'm happy to take on the workaday maintenance of this library and to be involved in discussions of a potential Matrix. I've pinged Paul Lam as well, as he's been the most active guy here recently, to gather his thoughts.

mikera commented 11 years ago

Great to have you here Edmund - knew this kind of thing would probably be up your street!

I jotted down some ideas on a Wiki page to be a bit more permanent than an issue discussion. https://github.com/tel/clatrix/wiki/Common-Clojure-matrix-interface

Joseph - hope you don't mind clatrix being a temporary host for this initiative?

Quantisan commented 11 years ago

Sounds good. I agree with the protocol approach. clojure.core.protocol is a good example of polymorphic functions. We can start with some basic matrix operations by declaring defprotocol plus, mult, and transpose in clatrix.

I've done some work on my branch of clatrix that haven't been merged upstream yet. Matrix object there is a sequence so users can perform clojure operations on it as expected. e.g. first, rest, etc.

I've added you guys to my repo so feel free to hack it.

ejackson commented 11 years ago

Thanks Paul ! I'll go take a look.

mjwillson commented 11 years ago

Thanks, I'll try and add some more detailed comments later :)

For now: any thoughts on basing these protocols on a slightly more general numpy-style n-dimensional array approach? as in e.g. http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html This is similar to the array interface in Matlab too -- quite proven as a basis for numerical work. IMO this is one of those cases where by solving a slightly more general problem (n-dimensional arrays) you get a cleaner abstraction with less special cases.

About using a separate protocol for each operation (plus, mult etc) -- slightly wary of this in that it seems a bit limiting to specific kinds of operations -- would a load of new protocol implementations need adding for any new operation I want to apply in a vectorised fashion over an array?

I think the way numpy does this is quite nice -- pretty much any suitable function ('ufunc' -- you can create your own) can be applied over an array. In the case of binary or higher-arity functions the argument arrays must have compatible shapes, and there are rules for how operations are 'broadcast'. I'll try and come up with a more constructive proposal alter for how this could be done with clojure protocols once I've had more of a think about it. Would obviously still need to allow for optimised implementations for applying common arithmetic ops.

Quantisan commented 11 years ago

@mjwillson Incanter took this type encapsulation approach but the implementation ends up using cond in operators to check for object type then calling different algorithms.

In the case of ufunc, for example, you might not even need it here because map and reduce just works (even now in clatrix 0.2.2). For shape-handling, combine-with offers an example.

tel commented 11 years ago

Mikera: not at all! Ejackson: I'll give you commit access when I'm at a laptop next. Let me know how you want to work with Paul.

mjwillson commented 11 years ago

Hmm, it's that pesky http://en.wikipedia.org/wiki/Expression_problem isn't it. Hardcoded cond clauses aren't very nice.

Re map -- this will just return a clojure seq though right? I'm thinking about applying operations in a way which preserves the container type and is ideally optimised in a fast implementation-specific fashion.

Perhaps if there was an array protocol which supported a map function which preserves the container type?

mjwillson commented 11 years ago

Ah apologies I guess you meant clatrix.core/map not clojure.core/map :) this is nice, although nicer if like ufuncs it generalised to e.g. (map binary-op x y) and worked whenever x and y have compatible (not necessarily identical) shapes.

It shouldn't be too hard to come up with a protocol for n-dimensional arrays supporting this kind of generic treatment -- functions like

(shape array)
(get array index-vector)
(set! array index-vector value)
(slice array slice-specs)
(permute-axes array perm)

and so on. Some implementations might only support 1-dimensional (vector) and 2-dimensional (matrix) array shapes, but a general fall-back implementation could be given based on java arrays for the general case.

How to dispatch optimised arithmetic and BLAS routines is another question though. Perhaps you're right and separate per-operation protocols would be the best approach for this. How to handle the problem of dispatching on the types of both arguments for binary and higher-arity operations though? perhaps some general coercion mechanism which implementations could plug into? Or perhaps even multimethods -- might not necessarily be the worst thing in the world for dispatching vectorised operations over large arrays -- where the cost of dispatch should be relatively insignificant compared with the number-crunching costs if you've vectorised your code well enough.

Continuing to mull this over anyway, glad to see there's a good crowd interested in this stuff!

mikera commented 11 years ago

@Matthew I think per-operation protocols are wise - IMHO it's better to keep protocols small and geared towards a specific purpose. 1-3 functions per protocol should probably be the common case.

regarding dispatch on parameters, I think there will need to be a coercion protocol that allows implementations to coerce parameters (i.e. something other than the first parameter) to forms that they accept. Something like:

(defprotocol PCoercion
  (coerce [matrix param]))

Then implementations can control what kind of inputs they accept, and we can even potentially do fun things like mix implementations if the coercion routines have enough logic to handle conversion between matrix / vector formats. I'm imagining an implementation something like:

(extend-protocol PCoercion
  mikera.vectorz.AMatrix
    (coerce [matrix param]
      (cond
        (instance? AMatrix param) param
        (instance? AVector param) param
        (generic-matrix? param) (convert-to-vectorz-matrix param)
        (generic-vector? param) (convert-to-vectorz-vector param)
        :default (error ("Can't coerce to valid matrix/vector format")))))

I realise this sort of code is a bit painful and we don't like instanceof checks in general, but at least we'll only need to write it once per implementation. So we can keep most of the ugliness wrapped up in one place.

Obviously the fast path for coercing a matrix / vector that is already in the right format should just be a instanceof check or two so it shouldn't be a big performance concern.

Regarding 1D / 2D / nD support I think we should be able to handle that - but let's please keep a fast path for 1D / 2D access as that will definitely be the common case. i.e. we shouldn't be forcing every 2D matrix element access to wrap a index vector of [x y] co-ordinates - that would be a nightmare for performance.

mjwillson commented 11 years ago

Re fast paths for 1D and 2D, yep certainly. Could have special protocols for 1D and 2D arrays as well as implementing a generic n-dimensional array protocol say, and obviously would want to avoid code going via slow generic interfaces wherever possible in favour of optimised routines from some underlying library.

It'd be nice if we could keep the optimisations nicely tucked away as an implementation detail though if it's not too bothersome to do so. In terms of an outward-facing API, viewing things as nD arrays has advantages in giving a consistent general API for slicing, operator broadcast, stacking, striding / efficient subslice 'views' and so on.

That said re overhead from using index vectors: this doesn't seem to prove fatal for numpy, where element access via getitem uses a Python tuple for index. I guess the thing with numpy, matlab etc, it's quite rare and not really the recommended use case to use element-wise access APIs directly. The whole array programming style is to vectorize wherever possible, and if you're working with reasonably large arrays the overhead of the abstractions around them generally doesn't matter nearly as much as the cost saving from the vectorised operations. Would just need to ensure that the internals don't go via the slow APIs whenever there's a fast vectorised implementation available.

It looks like you're coming from quite a different perspective with vectorz though, if I understand it's optimised more for 3D graphics and low-dimensional (in the vector space sense not the n-dimensional array sense) linear algebra? Suspect the performance trade-offs around the abstractions could be quite different for this kind of stuff than for many other numerical applications (statistics / machine learning / signal processing / imaging /...) where the vectors/matrices/arrays are usually bigger, but hopefully we can find a way to optimise for both cases without hurting the generality of the API too much.

Incidentally, off the top of my head just a couple of common use cases for fast 'tensor' / >2D array maths in case anyone's wondering: cross-tabulation of count data by multiple factors, image processing where you have (X, Y, color-channel), sometimes (X, Y, Z) in scientific/medical imaging, (X, Y, time) for video, and so on.

mjwillson commented 11 years ago

Like the coercion protocol idea. Perhaps a good test would be how well it copes with sparse matrices. Matrix multiplication for sparse-dense, dense-dense and sparse-sparse would all ideally have distinct optimised implementations, and automatically coercing a sparse matrix to a dense one could be pretty disastrous in some situations.

mikera commented 11 years ago

I expect sparse matrices would be handled by the implementation's multiply protocol - this could include logic for ensuring that the right type of transformation gets applied. The coercion function just needs to get it into a form that the implementation understands (either sparse or dense).

vectorz actually has a lot of other matrix types, e.g. diagonal matrices (https://github.com/mikera/vectorz/blob/master/src/main/java/mikera/matrixx/impl/DiagonalMatrix.java). I think the same principle applies though. Ideally coercion would also handle smart conversion of specific matrix types (e.g. diagonal -> diagonal) but I think that can be left for future optimisation.

mikera commented 11 years ago

Yep you are right Matthew, vectorz is somewhat more tailored towards fast low-dimensional stuff and high throughput (potential for high numbers of operations per second). Hence the specialised Vector3 implementations, and also the options to do everything in place (mutable operations) and with primitive arguments (no boxing, no use of index vectors!). But it should still fit a general API, and if someone really wants to crunch billions of vectors as fast as possible they always have the option of using vectorz directly rather than going through the higher level API.

So overall I like the idea of generic slicing / dicing operations, as long as we keep the 1D / 2D fastpath. vectorz actually has specialised vector implementations that represent slices / subsets of vectors, so a generalised slice operation could probably just return something like this.

mikera commented 11 years ago

I created a repo to do some experiments and build a quick proof of concept:

https://github.com/mikera/matrix-api

Comments / patches welcome!

tel commented 11 years ago

Edmund, I've added you as a collaborator to Clatrix. I think given the active interest from the participants of this thread, I'd be happy adding others as well.

ejackson commented 11 years ago

Thanks, Joseph. I think adding the others would be a good idea.

Back on the topic: I think Mike's separation of the higher level Protocol into a separate library is the way to go. No need to import all of clatrix to get the protocol if you're going to use PC or whatnot.

mjwillson commented 11 years ago

@mikera thanks, will hopefully have a chance to play with it over the weekend!

About coercion protocols: when dispatching (mult x y) using a protocol implemented by x, I think for extensibility that implementation might sometimes need to say, actually based on the type of y I'm not confident that I know the best way to dispatch (mult x y), instead I'm going to ask y to dispatch (mult-opposite-arg-order y x).

This seems to allow for a reasonable amount of power in dispatching binary ops while using at most 2 protocol-dispatched calls. It's still not ideal and multimethods would probably be the most elegant and flexible way to handle it though. I guess there's no reason we couldn't have both -- a module with multimethod-dispatched binary ops and another with equivalent protocol-dispatched versions. Then you could choose depending whether you care more about fixed per-operation dispatch costs or about cost savings from selecting the most specialised implementations and avoiding unnecessary coercions, which would make more of a difference for larger matrices.

mikera commented 11 years ago

@Matthew - I think that it is fine to stick with protocols as the primary dispatch mechanism - as long as the implementation extends the protocol for (mult x ?) then it is free to perform whatever secondary dispatch it likes (a reverse order dispatch maybe, or even calling a multimethod)

The only limitation protocols gives us as an API is that the first dispatch must be on class. We can also wrap functions around the protocol, so (mult x y) could be a regular function that calls (protocol-mult-reversed y x), so in effect we can choose which parameter we want to do the protocol dispatch on.

I can't personally think of a case where protocol dispatch won't work in this domain - I think we can design things so that we will always have one parameter known to be of a type "owned" by the implementation.

Usually this will be determined by the first parameter being a matrix of a known class or interface that is recognised/owned by the implementation. Please challenge me on this one though - it's important we think about the special cases!

ejackson commented 11 years ago

@mjwilson That's really interesting. You're definitely right that in most cases the dispatch cost is completely negligible compared to the computation, so there multimethods are a potentially big win for flexibility. But do we need the flexibility ?

@mikera I think this is correct too. Core.logic does a similar argument reversing trick to good effect during unification. He talks about it here: http://www.youtube.com/watch?v=A7de6pC-tnU

So what do we think ? Can we imagine a case where arg-reversing won't work ? Is that simpler than multimethods ?

mjwillson commented 11 years ago

There's a flexibility argument for multimethods, sure, but also an elegance and boilerplate-avoidance argument too.

With protocol-based dispatch, the contract around binary operation protocols would be quite subtle and easy to get wrong, and require each implementer to write a certain amount of boilerplate conditional dispatch logic, possibly repeated for each binary operation.

With multimethods, the dispatch logic is handled for you and there's less risk of some implementation getting it wrong. (Just look at how many people get the equals contract wrong in OO languages.)

mjwillson commented 11 years ago

What I mean about the contract for protocol-based dispatch being somewhat subtle: I think you'd want to require that each implementation follows this sort of logic for a binary operation (mult x y)

If x is confident that it knows exactly which multiply implementation is optimal for the pair of implementation types of x and y, then it performs the operation directly
Otherwise, it must ask y to dispatch the operation before trying any kind of generic coercion, so as to give y (which might not have existed or been anticipated at the time x was written) a fair and equal opportunity to nominate an optimal multiply operation
Only if both the above fail, should it resort to some kind of (coerce-to-generic-compatible-matrix y)

Some minor downsides:

The only available implementations are those anticipated by the author of x or of y -- there's no way to add a new optimised implementation for two existing types without changing hard-coded conditional logic inside the protocol implementations for one or other of the types. This may not be a big issue.
This still privileges x somewhat in cases where x and y disagree about which is the optimal/preferred implementation. This should be rare and not a big issue provided people stick to the above contract and aren't overly aggressive in coercing or using generic implementations for other types.

Really the main downside as I see it is the requirement for somewhat subtle boilerplate conditional logic though.

mjwillson commented 11 years ago

I guess looking at it from this perspective, the protocol-based approach feels like an optimisation/implementation detail which shifts a lot of the burden of correct and extensible dispatch logic onto the protocol implementors.

Seems like the kind of optimisation which could be automated rather than involve boilerplate -- this being a Lisp, perhaps we could have a macro which looks at the multimethod dispatch table and spits out a bunch of protocol implementations? Bit out there I know, but just an idea :)

mikera commented 11 years ago

Hmmm I guess as someone who is likely to maintain an implementation I don't see much difference in effort / bolierplate between multimethods and protocols. Either way I would need to implement some well-defined set of methods, and maintain some dispatching logic.

I think it would be hard to automate dispatch in a standardised way because it is likely to be quite implementation dependent. As a example (mult x y) for Vectorz will likely require a cond test on the type of y, since Vectorz has different logic for 2D matrices and 1D vectors as multiplication targets. Other implementations that support arbitrary-dimension arrays with the same type might not need to do this test.

The good news of course is that we don't need to commit 100% either way: I suggest the best approach is to start with protocols (since they offer better performance) and keep multimethods in reserve so we can fall back to them if needed for specific operations that genuinely require multiple dispatch.

As for this all being an implementation detail: I completely agree, and ideally this should all be transparent to a user of the library. It's right that library implementers do this work once and make it work properly, so that users don't have to.

ejackson commented 11 years ago

Having thought more about this I'm coming down no the side of multimethods. Fundamentally the dispatch operation needs to care about the types of all of its arguments. Protocols can only do this via the arg reversing trick and clever wiring of the dispatches. This is incidental complexity. As multimethods can dispatch on all the arguments they are a more natural fit. Furthermore, as the dispatch time overhead of multimethods is likely to be tiny compared to the operation time I think its a good tradeoff. My 2c !

mikera commented 11 years ago

I really don't think multimethods are a good idea in this case. From my point of view:

I don't think dispatch actually needs to care about the type of all arguments. Specifically, I haven't found a single case so far where you can't do at least the initial dispatch (branching to implementation specific code) on the type of one argument.
With multimethods, what are you going to use as the dispatch function? If it is class or something else that is equivalent to "which implementation am I working on?" then you might as well have used a protocol....
The only case I've found in my tests so far where anything even slightly tricky like arg reversing was needed was multiplying by a scalar, in the reasonable case that the scalar is the first argument. But that was still very easy to handle (an extra 1-line test), on the assumption implementations can provide something like a (scale matrix scalar) function.
Multimethods would add enough overhead that that it would prevent the matrix API from being useful in situations where you need high levels of operation throughput (e.g. games or simulations where you use a lot of small matrices and vectors). With protocols you would probably be OK in any but the most extreme circumstances. I agree it wouldn't matter for large matrix operations, but I think we should try to be useful for as wide a range of domains as possible.
Even with multimethods, I don't think you can avoid the need for different implementations to build in a second level of dispatch for some cases., since different implementations will have different secondary distinctions between types. If multimethods don't solve this problem, then they again aren't buying any saving in incidental complexity.
It's an implementation detail anyway: in the spirit of being as simple and efficient as possible, I think the rational approach would be to start with protocols, and switch to multimethods on a case by case basis where they are genuinely needed.
Other Clojure libraries like core.logic have proved that using protocols is a good choice.

Happy to be shown code / counter-examples that demonstrate otherwise, of course!

ejackson commented 11 years ago

Maybe I have this wrong, because I do think the type of all arguments does needs to be considered. Here's an sketch example for for how I think we'd need to multiplication by a DiagonalMatrix, whether its the first or second argument you get benefits by special handling. Non-commutativity comes into this too for extra giggles.

I haven't put stuff in an order for easy reading, not compilation !

Does this make sense ? Have I missed an obvious thing ? I hope so !!

;; ---------------------------------------------------------------
;;  Stuff we need to deal with diag in first argument
(deftype NormalMatrix ...)
(deftype DiagonalMatrix ...)
(extend-protocol PMatrixMultiply
                 DiagonalMatrix
                 (matrix-multiply [d x]
                                  (matrix-multiply-by-diagonal x d)))

(defprotocol PMatrixMultiplyByDiagonal
  (matrix-multiply-by-diagonal [x d]))

;; This is not a commutatitive op, so we need to care about left and right, hence the two fns
(defn left-mult-by-diag  [x d] ...)
(defn right-mult-by-diag [x d] ...)
(defn mult-by-two diags  [x d] ...)
(extend-protocol PMatrixMultiplyByDiagonal
                 Matrix
                 (matrix-multiply-by-diagonal [x d]
                                              (left-mult-by-diag x d))
                 DiagonalMatrix
                 (matrix-multiply-by-diagonal [x d]
                                              (mult-by-two-diags x d))
                 ... others ...)

;; Now how to deal with diag as second argument in call.  We still want special handling.
(extend-protocol PMatrixMultiply
                 Matrix
                 (matrix-multiply [m x]
                                  (multiply-by-normal-matrix x m)))

(defprotocol MultiplyByNormalMatrix
  (multiply-by-normal-matrix [m x]))

(extend-protocol MultiplyByNormalMatrix
                 NormalMatrix
                 ... normal fn...
                 DiagonalMatrix
                 (multiply-by-normal-matrix [d m] (right-mult-by-diag d m))
                 ... etc ...)

ejackson commented 11 years ago

This all said, as @mikera is actually writing code, I'm 100% onboard with supporting the effort, and going along with his judgements.

mjwillson commented 11 years ago

Sorry I don't mean to nitpick from the sidelines and am planning on writing some code too to get an idea for how well (and how slow) the multimethod-based approach turns out to work :)

I'm continuing the debate here just to make sure we're not talking at cross-purposes and fully understand eachother's reasoning, will follow up on a few of your points separately.

mjwillson commented 11 years ago

I don't think dispatch actually needs to care about the type of all arguments. Specifically, I haven't found a single case so far where you can't do at least the initial dispatch (branching to implementation specific code) on the type of one argument.

But you still need to look at the type of the other argument too in order to arrive at the correct implementation right, whether via instance? tests or otherwise. So it's still multiple dispatch. The question is just whether we implement that multiple dispatch via multimethods, or via protocols and instance? checks.

With multimethods, what are you going to use as the dispatch function? If it is class or something else that is equivalent to "which implementation am I working on?" then you might as well have used a protocol....

I was assuming dispatch on [(class x) (class y)], so the types of both arguments. Will address 'might as well have used a protocol' later.

Even with multimethods, I don't think you can avoid the need for different implementations to build in a second level of dispatch for some cases., since different implementations will have different secondary distinctions between types. If multimethods don't solve this problem, then they again aren't buying any saving in incidental complexity.

Dispatching on [(class x) (class y)] would allow for conditioning on pretty much any kind of type-level distinction which an implementation wants to make on either argument. If some library wanted to make further value-level distinctions which aren't visible at type level then it would need to do its own secondary dispatch logic. Any examples where this might be the case though? I can't immediately think of any and either way don't think it's a big advantage or disadvantage to either protocols or multimethods.

mjwillson commented 11 years ago

The only case I've found in my tests so far where anything even slightly tricky like arg reversing was needed was multiplying by a scalar

I think this arg-inverting stuff is really important for extensibility if we go down the protocol route so want to make sure I've successfully communicated what the issue is:

Yes within a particular implementation library it isn't a particularly big deal, since you know about and control all the different types in play except say the platform scalar types.

The reason for requiring the arg-reversing trick is more about ensuring extensibility and interop with other people's matrix types. Say if you write your vectorz implementations and I later want to come along and add my own foo.Matrix type and supply an optimal implementation for multiplying a foo.Matrix by a vectorz.Matrix.

For (mult mine yours), I'm able to perform the dispatch correctly, but for (mult yours mine), I'm entirely reliant on you to delegate to me via the arg-reversing trick. If you don't do this, then I have no way to ensure my optimal implementation gets used, and you'll probably coerce my matrix to some generic format in order to multiply with it.

As a more concrete example, say someone's already added support for some dense matrix library. Later I want to come along and add support for a sparse matrix library and to ensure that my optimal routine gets used to multiple dense by sparse matrices.

Last thing I want is when someone does (mult dense sparse), that the protocol implementation for dense fails to give me any say in the matter and coerces me to a dense matrix leading to OutOfMemory.

Now, I guess you could agree on a special protocol for sparse matrices, and require that everyone handles the case of (if (sparse? other) ...), but then you're asking libraries which don't care about sparse matrices at all to know how to handle them. How many other special types of matrix (diagonal, symmetric, ...) would we have to extend this requirement to and what if someone thinks of a new one later?

Better just to require that implementations delegate to an arg-order-reversed call whenever they're not sure they know the best implementation, before falling back on a generic coercion.

This way we retain extensibility which (almost) matches that of multimethods.

(Still reckon multimethods are nicer though, in part because of the fiddly-ness off this contract which is required to maintain the same level of extensibility when implementing multiple dispatch via protocol calls)

mjwillson commented 11 years ago

Multimethods would add enough overhead that that it would prevent the matrix API from being useful in situations where you need high levels of operation throughput (e.g. games or simulations where you use a lot of small matrices and vectors). With protocols you would probably be OK in any but the most extreme circumstances.

Sure you're right here, but I might benchmark it just to make sure.

A multimethod dispatch shouldn't in principle require more than a single cached map lookup on [(class x) (class y)] and a function call -- compared with a protocol call followed by a bunch of instance? tests (and possibly another protocol call and another bunch of tests if the left-hand argument doesn't know how to do the dispatch and delegates it to the right-hand-side).

I'd guess the latter would be faster, at least provided the number of instance checks doesn't get too huge, but by how much? Will try and measure.

mjwillson commented 11 years ago

It's an implementation detail anyway: in the spirit of being as simple and efficient as possible, I think the rational approach would be to start with protocols, and switch to multimethods on a case by case basis where they are genuinely needed.

I guess 'simple' is in the eye of the beholder -- from my perspective multimethods are simpler, since they're a language feature which is purpose-built (in semantics if not always performance) for the task at hand, they express the solution more concisely and elegantly.

(Re-)implementing multiple dispatch via protocols is perfectly doable and almost as flexible provided all implementors are careful about extensibility, but creates more incidental complexity and so more something which I'd explore later as an optimisation if it proves necessary.

Agreed this stuff doesn't matter too much though -- it's an implementation detail which could be changed later, or both options could be supported with the same outward-facing API. So can approach it from either end and hopefully meet somewhere in the middle :)

mjwillson commented 11 years ago

One last thing: I think (correct me if I'm wrong) you're looking at this common matrix interface implicitly as something which a matrix library implements under the assumption that it's the only such library being used in a given application.

From this perspective it's less onerous to use protocols since you don't have to worry about dispatch which plays nice with other people's matrix implementations, whether existing ones or new ones which may be added in the future.

I was hoping for it to be more a framework within which different matrix libraries have all the tools they need to play nice with eachother in a reasonably optimal fashion. Perhaps this is over-ambituous, I don't know, but I think it contributes more to the argument for extensible dispatch mechanisms in general, and multimethods in particular.

You might argue that interop between different dense matrix libraries isn't so important since one will generally pick just one fast dense matrix implementation to use (although given that JBLAS, Parallel Colt etc all have overlapping but not exactly feature-complete support for different operations, you can imagine it might sometimes be useful to be able to mix and match).

When it comes to sparse matrices though, even within a single library there are lots of different kinds of sparse matrix implementation data structures in common use, with different tradeoffs. Just within the scipy library there are 7 different sparse matrix types for example: http://docs.scipy.org/doc/scipy/reference/sparse.html

These all need to play nice with eachother and with any dense matrix types, using reasonably optimal multiplication routines and certainly avoiding any unnecessary coercions to dense matrices. They also weren't necessarily anticipated by the authors of existing dense matrix types.

mikera commented 11 years ago

Insights coming think and fast, great stuff guys :-)

@Matthew - On the idea of creating a common framework that allows interop / mixing implementation stuff I definitely think that should be an objective.

In fact, that's one of the key things I'm working towards in the current prototype. Protocols seem to be able to handle this fine. You can do MatrixA -> CommonFormat -> MatrixB via the coercion trick to start with - this allows general purpose dispatch equivalent to [MatrixA MatrixB] without the two implementations needing to understand each other. Implementations that "understand" the other have the option to provide a better fast-path implementation if they know how. So it seems to solve the (mult mine yours) to a first approximation. More testing needed.

Not keen on [(class A) (class B)] dispatch. A few reasons:

Creating a vector is IMHO too big an overhead for dispatch.
It gives us a nasty problem of registering and managing N^2 handler functions. Sure we could automate stuff, but this has a potential to get really messy
Some implementations use subclasses - Parallel Colt and Vectorz do for sure. This makes N potentially very large and/or we have to create some complex handling for inheritance heirarchies.
It seems to imply that every implementation has to know every other implementation? who provides the handler function for [VectorzMatrix ColtMatrix]? Either that or you perform some kind of secondary dispatch for every instance of [VectorzMatrix ????] - but then you've just implemented a protocol with multimethods :-)

@Edmund - I certainly see where you are going with the DiagonalMatrix example. To me though, this stuff feels like the "secondary dispatch" that you hit after you have delegated to a specific implementation. I don't think we can generalise it into the common API, for the simple reason that different matrix libraries will make different decisions about the specialised forms of operations available. There is an almost unlimited number of combinations possible... upper diagonal matrices? permutation matrices? identity matrix?

From the perspective of the common API, I think we should just require implementations to support multiply and let them handle their own internal details. We can always add specialised fast-path protocols for stuff like multiplying by diagonal matrices later, but it would only be an optimisation.

Anyway I think the best way to solve this is with code.... I'll try to get the prototype to a point where we can demonstrate / experiment with multiple implementations today, and then we can throw rocks at something concrete.

mikera commented 11 years ago

FYI just got basic matrix arithmetic working in vectorz-clj in a way that supports mixing of matrix types (currently can mix clojure persistent vectors and vectorz AMatrix types). Example code:

https://github.com/mikera/vectorz-clj/blob/matrix-api/src/test/clojure/mikera/vectorz/test_matrix_api.clj

Extending the core.matrix API to vectorz wasn't too painful and serves as an example of what a implementation would need to do to participate in a protocol-style API:

https://github.com/mikera/vectorz-clj/blob/matrix-api/src/main/clojure/mikera/vectorz/matrix_api.clj

Let me know what you think!

ejackson commented 11 years ago

Somewhere in here I think we're not understanding another. I'm pretty sure @mjwilson and I are talking about the same thing, but I don't think I understand @mikera's responses. @mikera, would you mind working us through an example to help us connect ?

Lets say we want to do (matrix-mult A B) where A is a normal dense matrix and B is a special matrix the knowledge of which allows faster handling. How does your library discover that B is special and invoke the proper handling ?

This is the problem described by @mjwilson, and my code sketch above shows how I think Protocols would have to be constructed to deal with it. We all agree that the combinatorics of this is a nightmare. So how do you get around it ?

Another question @mikera: what is the coersion trick you mentioned ? I'm now going through your code to try discover but further explanation would certainly be helpful.

Thanks everybody for staying chilled out, makes a nice change from the internet at large !

mikera commented 11 years ago

I think there are three distinct kinds of dispatch to consider which may be causing the confusion.

Is the dispatch which takes us from the generic API to a specific implementation (e.g. Vectorz)
Is the dispatch which does any handling of special case matrices within an implementation
Is dispatch that handles the clevernesss of converting matrices across different implementations.

Happy to run with the mult example. Suppose the implementation is a fictitious Java matrix library FooLib that we want to use because it has magic GPU acceleration features.

Dispatch 1. is the first thing that gets called. Let's also say x is a FooLibDenseMatrix and y is a FooLibSparseMatrix.

First we define `(mult x y) as a protocol function that dispatches on the type of x.

(defprotocol PMatrixMultiply
  (mult x y))

The implementation for FooLib needs to extend the protocol to classes that it "owns". on the type FooLibDenseMatrix. This brings us into the code for the FooLib implementation.

(extend-protocol PMatrixMultiply
   FooLibSparseMatrix
      (mult [x y] ....)
   FooLibDenseMatrix
      (mult [x y] ....))

So this handler code gets called whenever (mult x y) is called with an class owned by FooLib. Dispatch 1's sole job is to get to the (mult x y) implementation for FooLib. This can be done with a protocol.

Dispatch 2. is a about what FooLib does when it handles the (mult x y) call. FooLib might have all kinds of custom functions to handle multiplication. There are a few cases to consider

In the simple case lets say every FooLib class has a fooMultiply methods that . So FooLib just calls (.fooMultiply x y). Nothing more to do - FooLib's fooMultiply presumably already knows how to handle multiplying Dense with Sparse matrices. So the handler code can just be:

(mult [x y] (.fooMultiply x y))

It's also possible that FooLib has different multiply functions for Dense x Dense, Dense x Sparse, Sparse x Dense, Sparse * Sparse. (I think that's a sign of bad design personally, but if we want to create a common API we need to consider the possibility that implementations may want to do strange stuff....). Then we can use a multimethod to handle the second dispatch:

(mult [x y] (foo-lib-multiply-multimethod x y))

;; multimethod definition however FooLib implementation wants to structure it......

So a multimethod can easily be used for dispatch 2, even if dispatch 1 is handled by a protocol. But this is a decision which is local to the FooLib implementation

Another option is when FooLib has some special case handling for certain other matrix types that don't come form FooLib. Then it is possible for FooLib to put in some conditional logic:

(mult [x y] 
  (if (foolib-matrix? y)
    (foo-lib-multiply-multimethod x y)
    (cond 
      (vectorz-matrix? y) (.multiplyWithVectorz x y)
      (jblas-matrix? y) (.multiplyWithJBLAS x y)
      :else (......))))

Note that the implementation here is highly FooLib-specific - only FooLib can possible know that is has special methods for multiplying with JBLAS and Vectorz matrices. So I think this has to be done within the FooLib implementation. I think it would be a bad idea to complect this with the general purpose dispatch 1 - we can't possibly hope to create an abstraction that covers all the possible implementation quirks.

I don't think that [(class x) (class y)] dispatch would be sufficiently general here either (even if you ignore the problem of how to specific the combinations that cross implementation) - suppose someone create a NDArray class in Java that is similar to NumPy's ndarray. Then you could easily need dispatch depending on the dimensionality of the NDArray, i.e. it would need to be custom logic.

Dispatch 3. is the special case when FooLib doesn't recognise the type of y - presumably because y comes from some other matrix implementation that FooLib doesn't know about. This is the general case of mixing implementations.

This is tricky - we can't expect implementers to write special case code to handle all other implementations. Aside from potentially having a huge number of combinations to handle, it is clearly a broken design to have every implementation depend on every other implementation. On the other hand, we'd like to handle this case if we can.

One solution is for FooLib to provide a coercion function that converts an arbitrary matrix into a type it understands. Then it can do something like:

(mult [x y] (foo-lib-multiply-multimethod x (coerce-to-foolib y)))

This is clearly feasible - even if FooLib doesn't know what y is, it knows it should be a matrix of some type, so y should support the matrix protocol functions including the (mget y row rolumn) matrix accessor at least.... so FooLib can construct a FooLibDenseMatrix or similar by reading of the elements of y. I actually think it is possible to define a coercion protocol, so you could do something like:

(defprotocol PCoercion
  (coerce [x y])

....

(mult [x y] (foo-lib-multiply-multimethod x (coerce x y)))

Note that x is used as a parameter to the coerce protocol function, so that FooLib can extend the protocol to the class of x. So the general contract of the coerce function is that it converts y to a type compatible with x.

Another solution is for FooLib to convert y to an intermediate format that it does understand, e.g. a vector of vectors. We can provide a utility function that FooLib can call, so the mult implementation has something like:

(mult [x y] (foo-lib-multiply-by-nested-vectors x (convert-to-nested-vectors y)))

Summary

Sorry this is all a bit of a ramble - just trying to get thoughts down. Here's my overall position:

Dispatch 1 is easy, and can be handled by a protocol
Dispatch 2 could get complex, but from an API design perspective the important point is that it is highly dependent on the specific implementation. I don't see how we can generalise this into a common API without getting into a real mess. IMHO the common API shouldn't get mixed up in the implementation details of specific implementations.
Dispatch 3 is really just a trick to enable interoperability when implementations don't know anything about each other. Again, seems like this needs to be implementation specific, but we can provide some shortcuts in the form of conversion functions and specification of interchange that implementation can use.

Note that I do have this working as a prototype for two implementations (clojure "vectors of vectors" and vectorz matrices). Haven't had a crack at extending it to Clatrix yet but it looks pretty straightforward as well.

mjwilson commented 11 years ago

Please be careful with your replies. mjwillson (two 'l's) is interested in taking part. mjwilson (one 'l') is getting repeatedly included by accident.

ejackson commented 11 years ago

Thanks @mikera, that's a very helpful response. Sorry @mjwillson for short-changing you an 'l' !

I've been thinking and talking about the '2nd dispatch' in Dispatch 2 of @mikera's explanation.

My view is that dispatch between implementations is probably over-reach. All of the methods discussed end up in combinatorial explosion. My, admittedly defeatist stance, is to not even try to do this beyond coercion. If a consumer really, really needs to do this, they can always do a special implementation. The general case is just vast.

Within an implementation, as @mjwillson says, it doesn't really matter whether multimethods or protocols are used, provided that the API surface is implemented. I'm good with this.

I'm happy to try extend clatrix to the API as described by @mikera. Let me start with a few pieces so that I can get up to speed and we can settle the API.

@mikera I don't know the numpy stuff much, have you been looking into it ?

mikera commented 11 years ago

@Edmund - yes I've been taking a look at NumPy. It's quite interesting.

Things I like:

support for arrays of arbitrary dimension
support for different element types
big community, smart people
NumPy used as a base for a wide range of other tools

Things I'm not so sure about

They seem to be tied to one specific implementation (ndarray). I think we can do better than that if we abstract properly.
Their implementation of scalars seems a bit complex / messed up. We definitely need to think this through - there are design traps here!

Keen to see what you can do testing the matrix API with Clatrix. Let me know if you need anything else hacked into the API, I have some time over the next couple of days to push this.

mjwillson commented 11 years ago

@mikera -- thanks for being patient with this stuff, although as @ejackson says I think we're still talking past eachother to some extent.

About this 'combinatorial explosions of operations' problem -- it's not like you have to register NxN different methods to use multimethods, its dispatch logic uses hierarchies to select operations using a really elegant and flexible kind of multi-arity inheritance mechanism which allows all kinds of fallback rules to be expressed. (If that kind of dispatch sounds expensive, note that the results are cached so in the common case only has the cost of a map lookup)

You're not tied to using just the java class hierarchy either, so this common interface library could define its own hierarchy with extra nodes added for things like "sparse matrix" or "dense matrix" or "diagonal matrix" which cut across different libraries implementation class hierarchies and allow generic coercion rules to be defined as a fallback when specialised operations aren't available. Hierarchies can be DAGs too so this is really very expressive. Definitely worth reading up on if you haven't played with multimethods much -- they really are designed specifically to solve this problem we're facing in an elegant way -- even if we end up rejecting them for performance reasons I think it's worth working through the motivation for this kind of multiple dispatch mechanism.

I'll try and put together an example though to illustrate what I mean in the context of this matrix stuff though, as you say code is clearer than words :)
I'm not convinced your suggestion of a coercion protocol fully works in your 'dispatch 3' example for my dense/sparse example. If I understand correctly, the idea is that to get around this problem the dense matrix library defines a "coerce to something I can multiply with" protocol, and that my sparse matrix library extends that protocol?
- Firstly, the dense matrix library doesn't know how to do dense-sparse matrix multiplication, so there's not really anything I can coerce my sparse matrix to which it will know what to do with, other than a dense matrix, which is a non-starter (has a potentially huge time and space cost and defeats the whole point of using a sparse representation). It really needs to give my more specialised library the opportunity to nominate or perform the optimal dense-sparse implementation.
- I maintain that "Dispatch 3" style extensibility is not hard, though, whether via multimethods or via protocols. With multimethods you pretty much get it for free, if going via protocols you just need to make sure all implementers use the arg order inverting trick according to the kind of contract which I did my best to explain earlier. Again code speaks louder so I'll try and put together an example.
You know that protocol dispatch is only really fast if you extend the protocol within the deftype right? AFAIK this is the only case where it ends up implemented as a java interface dispatch. If the protocol is extended on some existing type via extend-protocol etc, the dispatch still has to go via some kind of cached method dispatch table and may not be that much faster than a multimethod dispatch. Note that this means if you want fast interface-based protocol dispatch you'll always need to wrap the underlying type of a java library in order to implement the interfaces.
About these "but we can't afford to construct a vector" arguments -- I think I get where you're coming from, but in general if this kind of argument keeps cropping up it gives me a bad feeling of premature optimisation when doing API design in a high-level language like Clojure. I have to wonder if this is a real concern, why are you writing tight numerical inner loops in Clojure at all, let alone via some kind of abstracted layer over the top of a matrix library? It's not like it's hard to drop down to Java or to call (say) these vectorz implementations directly in the small(?) number of tight inner loops where performance is really critical. And for most people, it's often easy to rewrite a tight inner loop doing lots of small multiplications into a single vectorised operation over a larger matrix, as is usually the advice if you want performance in array programming environments like Matlab or numpy.

mjwillson commented 11 years ago

Sorry this took a while -- a code example using multimethods:

https://gist.github.com/4470837

This shows how you can set up different default coercion-based implementations using multimethods, while still retaining full control and the ability to dispatch to an optimal operation on the types of both arguments where this is required. Here there are three levels of dispatch available with really minimal effort on the part of implementers:

Coercion of both types to a generic dense matrix type followed by a generic multiplication -- provided both types extend a PCoercionToGenericDense protocol
Coercion of one argument to generic, and multiplication of the other argument directly with that generic, via PCoerceToGenericDense and PMultiplyWithGenericDense
Multiplication of both arguments directly without coercions via an optimal method dispatched potentially on the types of both arguments, where this is required.

It'd be easy to add further fallback dispatch rules in a similar way -- based on coercions to other intermediate formats for example. Using derives and prefer-method it's easy to ensure that the best implementation takes precedence and allow implementations to break ties however they prefer.

I'll try and attempt an equivalent version soon using protocol-based dispatch and the kind of arg-reversing-trick contract I mentioned, to illustrate what's required to achieve a similar level of extensibility this way.

mikera commented 11 years ago

@Matthew - thanks for building the example! It's very useful to have code to get a shared understanding of what we are talking about :-)

I think I understand quite well how it is working and what you have in mind. And I have nothing against multimethods other than the general engineering principle of "don't pay for what you don't need". I think they are an elegant general purpose dispatch mechanism.

I still don't see what the multimethods are gaining us however in this case: in your example what are they doing that protocols can't? Indeed, I see you are still using protocols for quite a lot of the secondary dispatching, and you are still building special-case logic within implementations (which is the same as you would need to do with protocols).

Also I think the conceptual problems with multimethod heirarchies will only become apparent when you take it past the simple case (Generic, FooMatrix, BarMatrix) point and start looking at real implementation-specific heirarchies (FooSparseMatrix, FooDiaginalMatrix, Foo3DIdentity, Foo4DIdentity, FooUpperTriangulrMatrix, Foo3DVector, FooQuadTreeMatrix etc.). How are you going to build a implementation-independent multimethod dispatch hierarchy that works for all plausible implementations and combinations thereof?

To prefer multimethods over protocols I would personally want to be convinced:

They enable us to do something useful for API users that protocols can't. Users of library are a much more important constituency than library implementers, so I'm personally willing to go through quite a bit of implementation pain if it makes the life of e.g. Incanter users better, but not the other way round. Currently I think multimethods == protocols
The performance overhead is worth paying, and doesn't make the API too slow for reasonable usage scenarios (e.g., someone doing 3D visualisation of a few hundred thousand data points). Currently I think multimethods << protocols
They don't mess up the public API. We don't want to build a leaky abstraction here! Currently I think multimethods == protocols since in both cases we can hide the dispatch as an implementation detail behind a public API
They enable us to achieve the vision of a common matrix API that supports multiple implementations. Currently I think both multimethods and protocols can do that, so multimethods == protocols

So by my highly unscientific count that's about 4.0 - 3.0 in favour of protocols.....

Final thought: why not have the best of both worlds:

Use protocols to handle the primary dispatch. That way we can ensure decent performance for the common cases (there is no value in using multimethods for multiplying a vectorz matrix by another vectorz matrix, for example!). This satisfies my main objection about "not paying for what you don't need".
Provide a multimethod-based secondary dispatch mechansim, that implementations can call into as appropriate. I can certainly see the value of this in helping new implementations get up and running fast, with flexibility to optimise later. Libraries that have a fast path available can choose to use this instead of the multimethods on a case-by-case basis, and fall back to the multimethod otherwise e.g. for handling the cases of multiplying with special case matrices across implementations.

mjwillson commented 11 years ago

@mikera: To try and capture my main concern succinctly: I think fundamentally I'm happy provided that I can force (multiply existing-librarys-matrix-impl my-matrix-impl) to go via my fast multiplication routine without mandatory coercions happening first, and without having to change the code of the existing library.

This kind of extensibility is easier and more natural via multimethods IMO, but it is still possible via protocols. So it's not the end of the world if we go down the protocol route provided that the contract for protocol implementors carefully thought through and spelled out so that this kind of extensibility is maintained by everyone.

(Motivation:

Adding support for various sparse matrix representations which work with existing dense matrix libraries is one, but also any other case where someone wants to add some more specialised matrix representation (say a banded diagonal matrix, etc etc) without having to change the implementations of existing dense matrix types.

In these cases usually the implementor of the new more specialised representation type is going to be the one who knows how to multiply them efficiently with other more general-purpose types, rather than the other way round. And often being able to interact efficiently with other more general types is part of the motivation for the representation, so it's not good enough to force this to go via a coercion.)

mikera commented 11 years ago

@mjwillson - yep I get exactly what you are trying to do. Thanks for thrashing this out, it is important we think about the design from a lot of different angles

So I think the best solution is to dispatch on protocol first, but then delegate to a generic-multiply multimethod in the case that the matrix library of the first argument doesn't know how to handle the second.

Default implementation in the multimethod is to use coercion, but you can override this with your own implementation as the creator of library 2 if you have a better idea.

This enables you to have your custom implementation used for clever large matrix types (where you presumably don't care about the cost of multimethod dispatch), and at the same time it satisfies my concern that e.g. (mult small-vectorz-matrix small-vectorz-vector) doesn't need to go through an unnecessary multimethod dispatch operation (which is almost certainly more expensive than the multiplication itself for small vectorz matrices).

Sound OK?

If we're all agreed on this approach then I'll try prototyping it today.