shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Make linalg easy again #3100

Closed lambday closed 8 years ago

lambday commented 8 years ago

DISCLAIMER : Any similarity to anyone's presidential campaign slogan is purely coincidental.

tl;dr: (1) what the hell is going on (2) now the days have changed (3) get rid of that ugly {{...}} guy. (4) It would be a great opportunity

Guys, me and @karlnapf had a discussion to clean up linalg lib a bit. Here are a few bits and pieces of that discussion

So, here are the plans for how we're gonna do it.

// cpu_type = SGMatrix<T> or SGVector<T>
// in future, maybe even SGSparseVector<T> and SGSparseMatrix<T>
auto gpu_type = linalg::GPUFactory::get_gpu_type(cpu_type);
auto result = linalg::foo(gpu_type);

This makes it explicit - the dev knows that there is a cost involved (we could make it more explicit by naming it pretty_please_put_data_on_gpu_if_possible_and_yes_I_know_it_is_costly(...)). Plus, it makes it easier to code for this thing. Plus it helps us to get rid of that ugly linalg::Backend::FOOBAR guy (3) at the time of calling. (if we want anything to be under the hood, it's this guy).

Here is a sample gist of how the factory thing should work. And this is how it is used.

Would love to hear comments from @lisitsyn @vigsterkr @besser82 @sonney2k @yorkerlin. I think this should be done quickly (I can do it myself), before we get too many patches for linalg. Thoughts?

lambday commented 8 years ago

Few more ideas, while we're at this:

linalg::elementwise(m).product(n);
linalg::elementwise(m).sin();
// NATIVE case handled here
linalg::elementwise(m).custom([](auto& v) { v = something_fun(v); }); 
// ViennaCL kernel case handled here
linalg::elementwise(gpu_m).custom("v = something_fun(v);"); 

// similar for rowwise and colwise.
linalg::colwise(m).sum();
linalg::colwise(m).custom(/* write a lambda for cpu/string for vcl */);
karlnapf commented 8 years ago

This has my strong +1

vigsterkr commented 8 years ago

@lambday ok so can i ask a simple question: why do we need a yet another CUDA backend?

vigsterkr commented 8 years ago

and btw all these things should be runtime optimized and not user wise optimized... it makes everything much more easier for the user. so that the user asks for GPU type or CPU type is a foobar... this should be done under the hood, taking into consideration the cycles needed for moving around data from CPU to GPU mem...

karlnapf commented 8 years ago

CUDA backend? The idea is more to have linalg operations that are independent of the library used. So that the algorithm code is free from explicit library calls (CUDA/openCL/etc). Imagine CUDA now does things much better than ViennaCL in the future -- then we can easily change without touching the algorithm code

I don't agree on doing CPU vs GPU under the hood. I was of your opinion before, but changed it after I saw what can happen. This is extremely difficult to infer as it heavily depends on the use-case. We decided to make the GPU/CPU decision explicit to avoid the massive performance drops that we observed in benchmarks. Following the Python zen, muhahahah ;)

vigsterkr commented 8 years ago

i completely disagree with you @karlnapf about being extremely difficult: magma does this pretty good. and what does it mean that CUDA is better than ViennaCL? :)

vigsterkr commented 8 years ago

statically compiling in a library options is so f**ing 90s ;) it's 2016 now, nobody wants to create 10 different compilation just because the lab has 10 different machines in the cluster. you want to have one blob (or blobs if we do shared libs fw one day) and that's what you wanna distribute among the machines in your cluster/lab/toasters this is a reflection on @lambday's -DTURN_OFF_GPU idea

lambday commented 8 years ago

@vigsterkr I think we're again going back to the same question : should it be the user who decide to put the data on GPU or should it be the Shogun devs. This design that I proposed, it lets the devs take that decision, since what's happening under the hood is transparent from the user and the user usually won't have an idea of the impl details to take a good decision.

About that TURN_OFF_GPU thing, I think there can be another way to achieve this. You want a user to set that in a shogun program. So if I can make a POC for that to work, we should be good to go I guess :)

vigsterkr commented 8 years ago

@lambday it should be neither.. it should be decided based on simple rules.

lambday commented 8 years ago

@vigsterkr questions : (1) how good are those rules (2) how time/effort consuming would it be to implement those rules for shogun linalg, provided the limited (manpower) resources we have. Do we need to recreate the magic that magma does? (haven't checked that yet, but guessing that it has the answers for "what those rules should be")

If the devs decide to use GPU, it is usually a conscious/well-informed decision, based on benchmarking and testing.

lambday commented 8 years ago

and if the user is not happy, he can anyway turn GPU off with "turn_gpu_off" thing (let's assume that we come up a way to do that).

vigsterkr commented 8 years ago

those conscious/well informed decisions are actually can be compiled into rules. the rules are as good as the one who defines them :) (just to be an a**hole here, and i couldn't let this hanging). the turn off gpu should be in worst case an env/config option, but not a compile time option

lambday commented 8 years ago

@vigsterkr let's use Shogun to learn those rules :D kidding

vigsterkr commented 8 years ago

and the reasoning for doing semi-optimal solutions just because of not enough manpower i would assume never ended up in a nice place... and this (shogun) we all do (in our spare time) because we are for some magical reason enthusiastic about it (still)... doing a half/semi-optimal solution that you do because of feelings.... well that is just a broken dream.

lambday commented 8 years ago

@vigsterkr let's talk about about the complexity of those rules - propositional? first-order? horn-clauses? please let me know if I should be looking at anything in particular inside magma.

about half/semi optimal solutions, well, it's always constrained optimization problem :) given the constraints, I believe we sometimes have to stick with those. but if a better alternative is out there that doesn't violate those constraints, I'd love to take the optimal path. and that's exactly why I wanted to hear from you on this issue. question is : how optimal are we talking about exactly, if we do the things the way magma does? can you give me a brief summary of how those rules should be? just to get an idea.

karlnapf commented 8 years ago

@vigsterkr we do quite a bit of GPU programming at my institute -- my opinion that it is hard to decide when/if/how to transfer data to the GPU comes from that. Also, there are quite a few papers out there where people specifically design (known) algorithms for GPUs. The architecture of the algorithms often fundamentally changes, and these changes cannot be inferred automatically in general, but only for a subset of simple design patterns. Being explicit, especially in the messy Shogun code, in my eyes therefore is better.

I agree with you on making something as good as possible, so happy to discuss. About compile/runtime -- we had very clearly speaking benchmarks on this iirc. I would prefer the runtime switch as well, but I guess we need to think about this a bit more then. BTW this can also be changed later (runtime/compile time), the interface is more important now, right?

lambday commented 8 years ago

Done by @OXPHOS in GSoC16. Closing.