Closed applio closed 12 years ago
Before I merge the pull request, is there any way that OpenMP can be enabled in GCC? I only have a working knowledge of GCC, but according to this wiki article, OpenMP is available in GCC as well: http://gcc.gnu.org/wiki/openmp
That should give a significant speed boost (it does on Windows anyways).
This is awesome! Thanks so much. This will fix issue #5.
Yes, though I needed to make a small change in the primary openmp pragma statement for it to have any beneficial effect. The use of the 'for' directive depends upon there already being a team of threads, but none had been explicitly created (i.e. something like '#pragma omp parallel' in front of it somewhere would do the trick). Perhaps this is a subtle difference in Microsoft's implementation of OpenMP? Sites like http://bisqwit.iki.fi/story/howto/openmp/#LoopDirectiveFor claim "Note: #pragma omp for only delegates portions of the loop for different threads in the current team. A team is the group of threads executing the program. At program start, the team consists only of a single member: the master thread that runs the program."
So when I first enabled openmp for gcc, the resulting code still executed as a single thread and had the same execution time / wall time as before. Modifying the 'for' directive to read as '#pragma omp parallel for private(distance)' resulted in code that does spawn multiple threads and finishes its execution in less time.
On my Core i7-920 (4 cores) system, a 466x610 png input file that processed in ~35.5 seconds without the benefit of openmp is processed in ~8.9 seconds with openmp (and the above modified pragma statement). Very cool!
I have pushed my changes and should trigger an updated pull request momentarily.
On Wed, Feb 22, 2012 at 8:06 PM, Sahab Yazdani reply@reply.github.com wrote:
Before I merge the pull request, is there any way that OpenMP can be enabled in GCC? I only have a working knowledge of GCC, but according to this wiki article, OpenMP is available in GCC as well: http://gcc.gnu.org/wiki/openmp
That should give a significant speed boost (it does on Windows anyways).
This is awesome! Thanks so much. This will fix issue #5.
Reply to this email directly or view it on GitHub: https://github.com/thegrandpoobah/voronoi/pull/13#issuecomment-4129369
Nevermind -- I guess I don't need to update the pull request. Ah, the magic of github!
Awesome! Not sure about the OpenMP pragma discrepency, although visual studio only supports OpenMP 2 (GCC seems to support v3+) so that could be it. It definitely works without the "parallel" in VS2010, so when I get home tonight, I'll take a look to see if the change you made has any negative perf impact. Worst case scenario, I'll just #ifdef WIN32 it so that it works under both environments.
Compared to my previously quoted runtimes for that example input, the wall time (with openmp and -O2) just dropped to ~1.7 seconds.
Thanks so much for your help. Can't wait to make sure that WIN32 didn't regress and then merge this pull request.
BTW, its not documented anywhere, but if you invoke the program with the logging option (--log), you'll get performance metrics from the program itself (rather than having to rely on wall time).
I found that option already and you do have it documented in that when you launch with --help it spills info on a bunch of available flags. I thought that a nice touch.
I used the loose term "wall time" to avoid tedious differences between things like "user time" or "system time" in relation to the imprecise "execution time". I've been using that '-l' flag on almost all my runs but it shows iteration times and does not display overall time to completion. For that, I've been using the convenient unix shell command 'time' which does report such. Example: time ./voronoi_stippler -s 6000 -n -z 1.45 -l input.png output.svg
I've logged issue #14 which will add "total time" to the perf metric output. Should prove useful in the future.
...including additional explicit includes that hopefully won't impact WIN32 builds and a gnu Makefile.
You did all the hard work already, thegrandpoobah -- thank you!
Note: I did not test on WIN32 to verify that my modifications are innocuous there.