mikeizbicki / HLearn

Homomorphic machine learning
Other
1.62k stars 138 forks source link

Interested in contributing #83

Open Drezil opened 7 years ago

Drezil commented 7 years ago

I am interested in contributing and i have decent Haskell-Skills .. but this stuff is still a bit above my head.

I would like to have an implementation of the CVM (Core-Vector-Machine - basically a O(n)-SVM with epsilon-error by dualizing the problem, converting it to the dualized CCMEB-problem (center-constrained minimum enclosing ball), converting it to the primary CCMEB and then using an approximate O(1) solver for the CCMEB via a core-set) and some examples for the use of SVM with different Kernels.

I would like to toy around a bit and try to use it on my current problem (including whitening, K-Means, GMLVQ or similar clustering for preprocessing and then an SVM or CVM with custom Kernel for training & classifying) and fix/develop stuff as i go along.

At first i would like to start to flesh out some examples on how to use a SVM with different Kernels and lay the groundwork for that. The thing in https://github.com/mikeizbicki/HLearn/blob/master/src/HLearn/Classifiers/Linear.hs mentions SVM, but an SVM is no GLM (in the statistical sense). I think it still fits into Linear, because it is just linear classification in some scalar-space given by the kernel (so just phi(w)^T phi(x) instead of w^T x).

I had lectures on that topic (SVM, CVM, ...) and know all the math and pseudo-code, but i am not sure if i can transfer that on such an abstract level to haskell and meybe need help doing that.

Are you interested in working together?

mikeizbicki commented 7 years ago

Thanks for reaching out! But unfortunately, I'm going to recommend you not do this project yet.

You're right that the SVM support is very limited right now. The basic problem is that the linear algebra support in Haskell is very poor right now. It would be possible to do what you propose with the current linear algebra systems, but I would recommend against it. The resulting code will probably be pretty ugly, slow, hard to maintain, and need to be rewritten anyways when better linear algebra support comes along. So my recommendation would be to not use Haskell for this task.

If that didn't scare you away, send me a link to the paper you're thinking about implementing. I can tell you what parts of the linear algebra systems would need reworking for your problem and maybe you could help get those up to speed first.

Drezil commented 7 years ago

Why is the linear-algebra-support pretty poor and slow in Haskell atm? I thought your subhask-library does a decent job at optimizing thing, doesn't it?

Things WOULD be pretty simple as it is just basic LA. Just some scalar-products, vector-addition and an ocassional derivation of n-dim -> scalar-functions.

I did a bit on research beside my lecture notes and came across https://mitp-web2.mit.edu/sites/default/files/titles/content/9780262026253_sch_0001.pdf with 3 different approaches to the problem.

Basically it is: find argmin(a_i), given sum_ij a_i a_j y_i y_j k(x_i, x_j) - where 0 <= a <= C, y_i labeles in (-1,1), x_i datapoints and k a valid kernel.

My first intuition was to just plug that formula into a conjugate-gradient-descent (but ghc threw type errors without end at me.. -.- see: https://github.com/Drezil/HLearn/blob/broken_svm/src/HLearn/Classifiers/Linear.hs#L129-L205 - and yeah.. i already know some things wrong with that, but i had no time to work on that up to now.)

I would now try to implement something like "Algorithm 1.3" linked in the paper above. All steps that are commented are explained in detail in the case-study on LibSVM. tho other two algorithms (1.1 & 1.2) could be implemented as well for comparison. Additionally one could think of abstracting over some things in this setting.

Especially the difference between a C-SVM and a CVM is the algorithm to find the alpha_i-coefficients (C-SVM does this like LibSVM does that, CVM modifies the kernel to fit the CCMEB-dual problem, makes the CCMEB-primal problem and solves that with a core-set in O(n) and transfers that information the pipeline back resulting in an O(n)-algorithm instead of an O(n³)-Algorithm).

Even CVM were proposed around 2009 i could not find a proper (non-matlab) implementation of that algorithm and it would really be helpful to have one of them - no matter the performance for now.

You can hit me up on IRC or Jabber as well if you want to discuss details..

mikeizbicki commented 7 years ago

Subhask provides a decent story for linear algebra, but it's still not (at least what I consider) great. I tried implementing some kernel learners a while back. And while I got some things working, it was very slow. The main reason is that mutable vector operations are still very slow in subhask (as in O(n) time to update a single entry in a vector, so rediculously slow), and all the kernelized algorithms I tried required mutable operations. Fixing this is not a hard task, but it's not something I've had time for yet.

If you want to give it a go, I'd be happy to help you. (I don't have too much free time, but I'll do what I can.) I just want to make sure you're fully aware that everything you're working with is super experimental, probably won't work quite like you expect, and will require some deep digging into subhask's internals.

Drezil commented 7 years ago

For a Vector x most kernels depend directly on a variation of the scalar-product with some operations in them. i.e. gaussian-kernels are just k x y = let d = x-y in exp - (d<>d) etc. The problem arises in the usage of algorithms that optimize for those kernels.

In the case of an SVM (as implemented in libsvm) you have n d-dimensional inputs and an n-dimensional weight. here you only update 2 components of the weight at the same time - but you still have to compute n kernels with the complexity O(d) - so in total you go something like O(n) for the traversal + 2 O(nd) for updates instead of O(1) + 2O(nd). Not ideal, but not game-breaking.

On the other side the "slower" algorithm 1.1 in the paper above relies on changing everything in the weight-vector anyway.

Back to the main question: Why are they so slow? For BVectors i would understand that problem (if you have to start chasing pointers), but for UVector and SVector that should just come with no additional costs. Or is it just that there is currently just no operation supporting that mode? That should be somewhere in IxContainers or so.

I would suppose something like (!~) :: Index -> Value -> IxContainer -> IxContainer in conjunction with (&) = flip ($) like in lens. So you could do something like vector & 5 !~ e . 6 !~ pi . 23 !~ 42 to update specific entries of one vector. One could write a default implementation in terms of imap where specific unboxed and/or storable types could overwrite it with an optimized version.

All in all i would like to give it a go, because i regularly will have to implement such papers in my further career in university and working with matlab/octave or python drives me nuts... so i would be willing to dump quite some time in this project if it makes my life easier in the long run (i.e. next 5 years).

mikeizbicki commented 7 years ago

In the case of an SVM (as implemented in libsvm) you have n d-dimensional inputs and an n-dimensional weight. here you only update 2 components of the weight at the same time - but you still have to compute n kernels with the complexity O(d) - so in total you go something like O(n) for the traversal + 2 O(nd) for updates instead of O(1) + 2O(nd). Not ideal, but not game-breaking.

If I understand you correctly, you've forgot to include the number of iterations t. You calculate the kernels in a preprocessing step taking time O(n^2), then the iterations will take time O(nt) without mutability, which is a huge penalty.

Anyways, I've also added quite a bit more support for mutability than existed when I was playing with kernel algorithms a while back. In subhask, the goal is to unify all mutable operations into a single system defined by https://github.com/mikeizbicki/subhask/blob/master/src/SubHask/Mutable.hs . This mechanism wraps around the many ways mutabilty is handled in standard Haskell (Prim,Unbox,Storable,etc.). The end result is that every type in subhask has exactly one mutable version of that type that can work in either the ST or IO monads.

All algebraic operations have both mutable and immutable versions. See for example (+=) as a mutable version of (+): https://github.com/mikeizbicki/subhask/blob/master/src/SubHask/Algebra.hs#L1009 There's currently no mutable version of the lookup operator, but that's mostly because I couldn't settle on a syntax I like. It would be easy to add something like the (!~) operator you suggest, and I'd be open to a pull request doing so.

Vectors currently fully support fast mutable operations. There's actually no extra overhead for BVector compared to e.g. SVector because GHC supports mutable boxed arrays internally, which is what `BVector uses. Matrices, however, use hmatrix as a backend (instead of going directly with a BLAS), and hmatrix does not support mutability; therefore there are no mutable operations for matrices. There are a number of ways to hack around this, but the ideal long term solution is to remove hmatrix as a compatibility layer and have bindings directly to BLAS. This has the benefit of giving not only mutable support, but also sparse and approximate linear algebra as well depending on which library you bind to. Another side effect of the current hmatrix setup is that sometimes uneeded matrices will get created when evaluating complex forumlas, but I doubt this will show up in an SVM implementation.

Finally, subhask is in the process of being upgrade to GHC 8. There's a working branch over there, but hlearn hasn't been updated yet. I'm not going to get around to this for some time probably. If you're looking for a good place to start, getting hlearn to compile with the GHC 8 branch of subhask would be awesome!


All in all i would like to give it a go, because i regularly will have to implement such papers in my further career in university and working with matlab/octave or python drives me nuts... so i would be willing to dump quite some time in this project if it makes my life easier in the long run (i.e. next 5 years).

That was very similar to my motivation :) I've found that it's taken a lot more work than I expected and has distracted me from other projects I wish I could have done. On the plus side, I've learned a lot about the internals of linear algebra libraries and have a very different perspective on theory than anyone else I've encountered. Overall, I still think it was a good decision personally for my longterm goals, but I definitely wouldn't recommend it to everyone. Hopefully things will be a bit easier for you at this point. Are you working on your phd right now?