jeanluct / braidlab

Matlab package for analyzing data using braids
GNU General Public License v3.0
23 stars 9 forks source link

Convert color_braiding to C++? #55

Closed jeanluct closed 9 years ago

jeanluct commented 9 years ago

Michael is hitting some performance limitations when dealing with large number of particles. Start checking which part of that routine would benefit from MEXing.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-11 16:31:09+00:00

See commit 34f72f6 Further testing is needed, but so far so good...

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-13 14:21:37+00:00

What's the speedup like?

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-13 14:46:26+00:00

Plain C++ version gives the result in roughly half the time. However, I've just uploaded commit 1076db0 that adds a multithreading support which parallelizes the most time-intensive part of the computation (pairwise crossings double-loop that runs in Nstrings^2 time). On magma1 (multicore machine), the speedup is (pure MATLAB) 30 sec to (C++ 16 threads) 4 sec for 150 braid strings of length 20 000. You can play with parameters in devel/test_colorbraid.m. I can do a more detailed time analysis later if there's interest. C++ code is still not enabled by default since we should write some unit tests to make sure things do not fall apart before we roll it out into the main distribution.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 15:23:21+00:00

Compilation problems:

I just checked and at home I have to use gcc 4.6 or earlier with Matlab 2013a. I'm using 4.4 and compile fails with:

#!bash

In file included from colorbraiding_helper.hpp:47,
                 from colorbraiding_helper.cpp:31:
ThreadPool.h:46: fatal error: future: No such file or directory
compilation terminated.

    mex: compile of ' "colorbraiding_helper.cpp"' failed.
jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 15:39:39+00:00

OK, I checked. std::future is available from gcc 4.6.2 onwards. That library is not essential for threading that we use, so I'll take a look and modify ThreadPool.h so that we can minimize 4.4+ dependencies. std::mutex and std::thread are what's really essential here and they seem to be available in gcc 4.4. See "Runtime library libstdc++" section of https://gcc.gnu.org/gcc-4.4/changes.html

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 15:51:32+00:00

Maybe a simpler solution is just to disable multithreading with a preprocessor directive for earlier GCC. I don't like to modify external libraries, since that makes maintenance harder.

Also, do you think it's possible to move ThreadPool.h to extern? The idea is to keep things written by others in that folder, to make maintenance and licensing easier.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 15:58:44+00:00

I think it's possible, yes. ThreadPool really isn't a library that has a high-degree of complexity - it is just a sample implementation of a ThreadPool pattern. I could have written my own, but decided against it, as I didn't want to do all the manual labor myself. It would have probably taken me a couple of days of writing and testing. The release just requires attribution and it's freely available for including in one's own projects.

Regardless of where we keep it, the benefit of multithreading is pretty good, so I would still like to modify it and use it while keeping support for gcc 4.4. I've just been running a computation with it that completed in a few hours what previously needed a couple of days (actually, more, because I stopped it due to impatience).

I'm installing 4.4. on my system to see whether I can compile as I go.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 16:01:20+00:00

Ok, then I would say: move to a subfolder in extern, then in the extern/README file describe the changes that were made.

Sounds like a plan?

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 16:04:17+00:00

Or I guess if it's short and modified enough there's no need to put it in extern, as long as we leave some kind of licensing info. Also, it does state that you need to label it as modified in the source.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 16:04:51+00:00

I had a look and I think it'll be fine as it is. We'll add his name to the manual and the webpage.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 16:20:29+00:00

OK, great!

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 17:01:30+00:00

OK so I added a modified version of ThreadPool that does not depend on std::future and added preprocessor directive to enable it if GCC is < 4.6. Could you check to see that it works on your end? My installation of GCC 4.4 still hasn't finished, so I couldn't test it, but I did compile and run the _nofuture version on my compiler and it seemed to work fine.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 17:02:56+00:00

If this works, I could add preprocessor directives directly to ThreadPool, but I wanted to keep it separate for now, so we have a working and a non-working version.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-14 17:54:59+00:00

Well, I think you had the logic reversed, but it still doesn't work with _nofuture:

#!bash

In file included from colorbraiding_helper.hpp:49,
                 from colorbraiding_helper.cpp:31:
ThreadPool_nofuture.h: In constructor ‘ThreadPool::ThreadPool(size_t)’:
ThreadPool_nofuture.h:85: error: expected primary-expression before ‘[’ token
ThreadPool_nofuture.h: In member function ‘void ThreadPool::enqueue(F&&, Args&& ...)’:
ThreadPool_nofuture.h:131: error: expected primary-expression before ‘[’ token
In file included from /usr/include/c++/4.4/functional:70,
                 from /usr/include/c++/4.4/mutex:44,
                 from colorbraiding_helper.hpp:45,
                 from colorbraiding_helper.cpp:31:
/usr/include/c++/4.4/tr1_impl/functional: At global scope:
/usr/include/c++/4.4/tr1_impl/functional: In instantiation of ‘std::_Result_of_impl<false, std::_Bind<std::_Mem_fn<void (PairCrossings::*)(size_t)>(PairCrossings*, std::_Placeholder<1>)>&(size_t&)>’:
/usr/include/c++/4.4/tr1_impl/functional:149:   instantiated from ‘std::result_of<std::_Bind<std::_Mem_fn<void (PairCrossings::*)(size_t)>(PairCrossings*, std::_Placeholder<1>)>&(size_t&)>’
ThreadPool_nofuture.h:115:   instantiated from ‘void ThreadPool::enqueue(F&&, Args&& ...) [with F = std::_Bind<std::_Mem_fn<void (PairCrossings::*)(size_t)>(PairCrossings*, std::_Placeholder<1>)>&, Args = size_t&]’
colorbraiding_helper.hpp:530:   instantiated from here
/usr/include/c++/4.4/tr1_impl/functional:195: error: ‘std::_Bind<std::_Mem_fn<void (PairCrossings::*)(size_t)>(PairCrossings*, std::_Placeholder<1>)>&’ is not a class, struct, or union type

    mex: compile of ' "colorbraiding_helper.cpp"' failed.
jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 21:02:03+00:00

This is going to be more difficult than I thought. GCC 4.4 does not have more modern functional programming constructs so we can't run ThreadPool with it (I could write my own implementation of the ThreadPool for our purposes but that's something for future consideration).

However, if I try turning on multithreading depending on GNUC and similar macros, I run into problems in that Apple compilers lie. They claim GCC 4.2 as their version as they are backwards-compatible with GCC 4.2. However, I still have to figure out how to realize that we are more advanced than GCC 4.2. Work in progress.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-14 23:54:30+00:00

Hopefully now (commit 1ccab83 ) the 4.4 version will compile the unthreaded version of the code. However, it could be that your 4.4 will run the threaded version as well - apparently GCC 4.4.7 had some bugs on compiling on Mac OS X and the linux version should run multithreaded programs OK (I cannot test this). Try modifying the line colorbraiding_helper.hpp:48 to read 40400 and see if you can get the multithreaded code to run.

The code now throws a warning if MATLAB global variable was set to request threads but only single-threaded MEX is compiled.

Multiplatform support is hard.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-15 01:20:41+00:00

OK, I just edited the line above to try to compile multithreaded code with gcc 4.4 and got another failure to compile so I'm ready to give up on multithreaded support on gcc 4.4. Let me know if the code does not compile as is into the single-threaded mode.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-15 13:34:08+00:00

Yep, compiles fine!

Still doesn't pass the testsuite, though. First I think you should change the Matlab color_braiding to throw the same errors as your code (i.e., with relative rather than absolute EPS). The functions should behave the same.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-15 17:41:33+00:00

Fixed a small bug that was causing the problem in 6f20d4d. The other problem (a1f1261) was that your checks are more strict, so it would pass the Matlab version but not the C++ version.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-20 23:09:30+00:00

Could you check that the tests pass right now? The commit ade6492 implements the equality test using the same function areEqual for both MATLAB and C++ colorbraiding, however, I cannot make the test fail even if I use the pre-C++ equality testing. Perhaps I'm running the wrong test function. In any case, please let me know if this is now OK.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 14:36:02+00:00

Some comments:

On gcc-4.4 (no multithreading), why does the code complain when I call the constructor with "...,'threads',1)"? It seems the default # is 0, not 1. Doesn't 1 thread mean "not parallel"?

You are keeping both the global var BRAIDLAB_threads and the option 'threads'. Is that a good idea?

For warnings, I don't like the idea of always telling the user how to turn it off (causes a maintenance headache because of duplicate labels, and is inconsistent with other warnings). Instead, I can add a note in the manual about using lastwarn to find the ID. (Changed in c329e1e.)

I also changed color_braiding to colorbraiding, to make things uniform. (31e7681)

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 14:46:27+00:00

Thread = 1 used to mean "parallel version of the code, single thread", which used the multithreaded part of the code, but allowed it to use a single worker. Thread = 0 used to mean "single threaded version of the code", i.e., side stepping all parallel facilities. I tried to change it so that Thread = 1 and 0 mean the same thing (single-threaded code), but perhaps a warning logic slipped my attention. I'll take a look later.

As for C++ global, I left it in there to minimize the change in C++ code while I conform to the new Matlab calling requirements. I was planning to remove it. Matlab-global is not used any more. Work in progress.

Sure, feel free to update those warnings. Again, I was trying to make things easier for testing, to be easily aware of what's running and what isn't, while still making it easy for me to turn things off. (If multiple warnings are issued, e.g., running C++ code, running single threaded code while multithreaded was issued, then it makes it difficult to use lastwarn to detect the ID for the warning issued before another warning). I'm expecting that we'll remove these warnings once we're through testing the code.

BTW Margaux is in touch with me, she's going to try to see how the new C++ code plays with +lcs functionality.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 14:52:34+00:00

This all sounds great, thanks. Hopefully everything is in harmony with +lcs.

Maybe we should keep the global so that something like +lcs parallelizes automatically.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 14:55:13+00:00

Yeah, I am split about that. Perhaps we should make the (Matlab) braid constructor look for the global if no explicit "threads" options are passed and then decide what number of threads to pass to C++ code? Inside C++ code it can remain as it currently is, i.e., I would prefer to minimize this covert interaction with Matlab workspaces via searching for a variable with a certain name (even though that's how I detect BRAIDLAB_debuglvl from inside C++).

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 14:58:13+00:00

That sounds reasonable. In this case there really is no point in hunting for globals in C++, unlike BRAIDLAB_debuglvl.

Should the global be silently overridden? After all the user meant to do it.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 15:01:03+00:00

Yeah, it's difficult to say. Maybe the global should be prioritized, after all? And a warning issued if there is a conflict between explicit option and the global instruction? I'm definitely in favor of the warning, but I'm not sure at this point whether global or option should be picked in case of conflict.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:02:34+00:00

I definitely think the option should take priority over the global, since it represents more immediate user wishes.

A warning can't hurt, then.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:05:46+00:00

BTW, that's a good point about multiple warnings. I guess a user can turn them off one by one if needed. I just realized you can use the syntax:

#!matlab

warning('off','last')
jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 15:05:58+00:00

OK, so the "programming guideline" for end-users of braidlab would be something like "do not pass threads option to braidlab.braid unless you want to force N-threaded execution" (e.g., if it is used within matlab's parfor when the parallelization is supposed to be used on a higher-level). "Instead use global variable to set program-wide suggestion for number of threads used."

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:07:36+00:00

I think it's better the other way: use explicit 'threads' option unless you need globals for some reason.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 15:14:56+00:00

Wouldn't that be an automatic suggestion for the code user to prevent globals to affect their code? It's fine if we want to go that route, but it seems like we're allowing the global options, while we argue against their use. We might as well leave the global option undocumented at a high-level documentation then, describe only the explicit option, and reserve the global for our internal use. (I warned about "undocumentedbraidlab.com" surfacing as a page. :) )

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:28:57+00:00

That was the idea: to leave it undocumented.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 15:30:13+00:00

Alright, sounds good! But then we should make sure +lcs does not use explicit options, so we can parallelize everything in one swoop if needed, right?

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:31:52+00:00

Wouldn't it be somewhat harmless if it did?

Now I'm starting to fell like the global is better!

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-21 15:33:14+00:00

It wouldn't be harmless because it would be forcing a certain number of threads onto a user. So if I want to run it first on magma and then on my office machine, I'd have to dig into its internals and change the number of threads each time.

(We don't need to decide right away, that's why we're in "testing" phase ;) )

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-21 15:41:14+00:00

I suppose the other advantage of a global is that it can then be used by other functions later...

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 14:59:31+00:00

I'm leaning towards global-only. Shall I strip out the 'threads' option?

What do you think about setting a default value if the global is not set:

#!matlab

>> ncores = feature('numcores')

ncores =

     2
jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 15:26:29+00:00

Oh this is interesting, I didn't know this function existed. Yes, I think this is a good idea for the default.

I would leave the option in (perhaps as under-documented), and have it override the global. Sometimes you would like to enforce several single-threaded braid analyses. E.g., if you have a lot of datasets, but each of the dataset is manageable in single-thread mode, then you might want to force each dataset onto its own core using Matlab's parfor, within which braidlab.braids run single-threadedly to avoid cross-processor communication. I guess you could force the global to 1 beforehand, but that seems unelegant to me.

I guess I just like to have a manual-override of a global as an option. But I'd definitely suggest that our default "best practices" guideline to be "Write braid constructors without any threading options, unless you know exactly what you're doing".

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 15:29:43+00:00

BTW since feature is undocumented, we should set its use within try-catch block, just in case it acts differently in one of the older/newer versions.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:11:13+00:00

Good thinking. Here's a link.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:13:26+00:00

My only issue with the 'threads' option is that it was very kludgy to implement for the braid class. I would need to implement it in databraid as well, and I'd rather not do that.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 16:17:18+00:00

OK, if we should kick one out, let's kick out the argument, and leave the global. :)

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:28:03+00:00

Ok, I'll get rid of the argument. With the possibility of using feature I'm less concerned in any case, since for most users just using the max number of cores is the right thing.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 16:30:33+00:00

Agreed!

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:38:34+00:00

Got rid of it in da9860a. Now you can get rid of it in colorbraiding_helper.?pp.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 16:40:05+00:00

I thought we agreed that _helper is still going to take it as an argument regardless, to avoid scanning the workspace from C++ file. Or did I misunderstand?

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:40:58+00:00

Sorry, I forgot about that. That's fine.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 16:41:37+00:00

hpp/cpp still have the unwanted globals floating around. I'll get to that today.

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-05-22 16:42:52+00:00

Right, so now: no global in the C++, but the default threads will be set in colorbraiding.m using feature.

jeanluct commented 9 years ago

From Marko Budisic on 2014-05-22 16:44:13+00:00

That's right.