Make linalg easy again - Githubissues

lambday commented 8 years ago

DISCLAIMER : Any similarity to anyone's presidential campaign slogan is purely coincidental.

tl;dr: (1) what the hell is going on (2) now the days have changed (3) get rid of that ugly {{...}} guy. (4) It would be a great opportunity

Guys, me and @karlnapf had a discussion to clean up linalg lib a bit. Here are a few bits and pieces of that discussion

We have to stop supporting linalg back-end switch Shogun-wise. Earlier it seemed like a useful idea. But now it looks like a {{insert swear words}} feature to keep supporting. It's super easy to break the build with this because it's super easy to overlook the technical details that are involved to bring this feature alive. It calls for nasty bugs (examples #2801, #2825) that makes one go like "what the hell is going on" (1). And in reality, we hardly think that it would cater for any greater good for the lib.
We have to stop automatic type conversion from CPU-type to GPU-type in the linalg methods. The data transfer involves some cost, which, if hidden under the hood, can go unnoticed. The devs should know when they want a GPU type.
We have to separate hardware requirements (whether GPU is available or not) from software requirements (which third-party library is available). Earlier Eigen3 wasn't a hard requirement, so in designing libalg, we had to assume the possibility that Eigen3 is not there. But now the days have changed (2). It's time we rethink some of these stuffs. We may also give support for CUDA in future.

So, here are the plans for how we're gonna do it.

A user of linalg would ask a GPU factory for a GPU type (data transfer from CPU --> GPU happens here).

// cpu_type = SGMatrix<T> or SGVector<T>
// in future, maybe even SGSparseVector<T> and SGSparseMatrix<T>
auto gpu_type = linalg::GPUFactory::get_gpu_type(cpu_type);
auto result = linalg::foo(gpu_type);

This makes it explicit - the dev knows that there is a cost involved (we could make it more explicit by naming it pretty_please_put_data_on_gpu_if_possible_and_yes_I_know_it_is_costly(...)). Plus, it makes it easier to code for this thing. Plus it helps us to get rid of that ugly linalg::Backend::FOOBAR guy (3) at the time of calling. (if we want anything to be under the hood, it's this guy).

In the above example, we can consider the following cases:
- GPU is available and ViennaCL is there (our default GPU lib) : returns a ViennaCL type
- GPU is available and CUDA is there : returns a CUDA type (yet to be born in Shogun)
- GPU is not available but ViennaCL is there : returns a ViennaCL type
- GPU is not available and ViennaCL is not there : returns the same cpu_type.
- GPU is available and ViennaCL and CUDA both are there : TODISCUSS
So, even in a platform where GPU/ViennaCL is not present, it doesn't break the build. It's like an advice to the algorithm: use GPU if there is any; otherwise, go on do your thing on CPU.
Now, a linalg dev would just have to write a bunch of overloaded methods (in a recent issue, we saw that the wrappers can slow it down). It would be a great opportunity (4) to encourage the devs to populate linalg with useful methods instead of scaring them off with template <template <class> class ....
We will still have a ViennaCL build. But that won't set the backend for all linalg methods in Shogun. It would just make the GPU-type as a VCL type, which would only be used wherever that factory is used. In short, it won't break anything.
We can also have a CUDA build in future, similar to ViennaCL build.

Here is a sample gist of how the factory thing should work. And this is how it is used.

Would love to hear comments from @lisitsyn @vigsterkr @besser82 @sonney2k @yorkerlin. I think this should be done quickly (I can do it myself), before we get too many patches for linalg. Thoughts?

lambday commented 8 years ago

Few more ideas, while we're at this:

If a shogun user has GPU, but wants to use CPU version instead (let's say, because the GPU is useless), there should be a cmake switch -DTURN_OFF_GPU or something. Essentially, this would just make the GPU factory to return a CPU type instead.
API should look cool. For normal operations, it would be just like I mentioned. But for element-wise, row-wise, col-wise, I'd love to have something like

linalg::elementwise(m).product(n);
linalg::elementwise(m).sin();
// NATIVE case handled here
linalg::elementwise(m).custom([](auto& v) { v = something_fun(v); }); 
// ViennaCL kernel case handled here
linalg::elementwise(gpu_m).custom("v = something_fun(v);"); 

// similar for rowwise and colwise.
linalg::colwise(m).sum();
linalg::colwise(m).custom(/* write a lambda for cpu/string for vcl */);

karlnapf commented 8 years ago

This has my strong +1

vigsterkr commented 8 years ago

@lambday ok so can i ask a simple question: why do we need a yet another CUDA backend?

vigsterkr commented 8 years ago

and btw all these things should be runtime optimized and not user wise optimized... it makes everything much more easier for the user. so that the user asks for GPU type or CPU type is a foobar... this should be done under the hood, taking into consideration the cycles needed for moving around data from CPU to GPU mem...

karlnapf commented 8 years ago

CUDA backend? The idea is more to have linalg operations that are independent of the library used. So that the algorithm code is free from explicit library calls (CUDA/openCL/etc). Imagine CUDA now does things much better than ViennaCL in the future -- then we can easily change without touching the algorithm code

I don't agree on doing CPU vs GPU under the hood. I was of your opinion before, but changed it after I saw what can happen. This is extremely difficult to infer as it heavily depends on the use-case. We decided to make the GPU/CPU decision explicit to avoid the massive performance drops that we observed in benchmarks. Following the Python zen, muhahahah ;)

vigsterkr commented 8 years ago

i completely disagree with you @karlnapf about being extremely difficult: magma does this pretty good. and what does it mean that CUDA is better than ViennaCL? :)

vigsterkr commented 8 years ago

statically compiling in a library options is so f**ing 90s ;) it's 2016 now, nobody wants to create 10 different compilation just because the lab has 10 different machines in the cluster. you want to have one blob (or blobs if we do shared libs fw one day) and that's what you wanna distribute among the machines in your cluster/lab/toasters this is a reflection on @lambday's -DTURN_OFF_GPU idea

lambday commented 8 years ago

@vigsterkr I think we're again going back to the same question : should it be the user who decide to put the data on GPU or should it be the Shogun devs. This design that I proposed, it lets the devs take that decision, since what's happening under the hood is transparent from the user and the user usually won't have an idea of the impl details to take a good decision.

About that TURN_OFF_GPU thing, I think there can be another way to achieve this. You want a user to set that in a shogun program. So if I can make a POC for that to work, we should be good to go I guess :)

vigsterkr commented 8 years ago

@lambday it should be neither.. it should be decided based on simple rules.

lambday commented 8 years ago

@vigsterkr questions : (1) how good are those rules (2) how time/effort consuming would it be to implement those rules for shogun linalg, provided the limited (manpower) resources we have. Do we need to recreate the magic that magma does? (haven't checked that yet, but guessing that it has the answers for "what those rules should be")

If the devs decide to use GPU, it is usually a conscious/well-informed decision, based on benchmarking and testing.

lambday commented 8 years ago

and if the user is not happy, he can anyway turn GPU off with "turn_gpu_off" thing (let's assume that we come up a way to do that).

vigsterkr commented 8 years ago

those conscious/well informed decisions are actually can be compiled into rules. the rules are as good as the one who defines them :) (just to be an a**hole here, and i couldn't let this hanging). the turn off gpu should be in worst case an env/config option, but not a compile time option

lambday commented 8 years ago

@vigsterkr let's use Shogun to learn those rules :D kidding

vigsterkr commented 8 years ago

and the reasoning for doing semi-optimal solutions just because of not enough manpower i would assume never ended up in a nice place... and this (shogun) we all do (in our spare time) because we are for some magical reason enthusiastic about it (still)... doing a half/semi-optimal solution that you do because of feelings.... well that is just a broken dream.

lambday commented 8 years ago

@vigsterkr let's talk about about the complexity of those rules - propositional? first-order? horn-clauses? please let me know if I should be looking at anything in particular inside magma.

about half/semi optimal solutions, well, it's always constrained optimization problem :) given the constraints, I believe we sometimes have to stick with those. but if a better alternative is out there that doesn't violate those constraints, I'd love to take the optimal path. and that's exactly why I wanted to hear from you on this issue. question is : how optimal are we talking about exactly, if we do the things the way magma does? can you give me a brief summary of how those rules should be? just to get an idea.

karlnapf commented 8 years ago

@vigsterkr we do quite a bit of GPU programming at my institute -- my opinion that it is hard to decide when/if/how to transfer data to the GPU comes from that. Also, there are quite a few papers out there where people specifically design (known) algorithms for GPUs. The architecture of the algorithms often fundamentally changes, and these changes cannot be inferred automatically in general, but only for a subset of simple design patterns. Being explicit, especially in the messy Shogun code, in my eyes therefore is better.

I agree with you on making something as good as possible, so happy to discuss. About compile/runtime -- we had very clearly speaking benchmarks on this iirc. I would prefer the runtime switch as well, but I guess we need to think about this a bit more then. BTW this can also be changed later (runtime/compile time), the interface is more important now, right?

lambday commented 8 years ago

Done by @OXPHOS in GSoC16. Closing.

shogun-toolbox / shogun

Make linalg easy again #3100