thegrandpoobah / voronoi

Weighted Voronoi Stippler
http://www.saliences.com/projects/npr/stippling/index.html
MIT License
58 stars 25 forks source link

Gentle modifications to support building on linux (Ubuntu 11.10 at least) #13

Closed applio closed 12 years ago

applio commented 12 years ago

...including additional explicit includes that hopefully won't impact WIN32 builds and a gnu Makefile.

You did all the hard work already, thegrandpoobah -- thank you!

Note: I did not test on WIN32 to verify that my modifications are innocuous there.

thegrandpoobah commented 12 years ago

Before I merge the pull request, is there any way that OpenMP can be enabled in GCC? I only have a working knowledge of GCC, but according to this wiki article, OpenMP is available in GCC as well: http://gcc.gnu.org/wiki/openmp

That should give a significant speed boost (it does on Windows anyways).

This is awesome! Thanks so much. This will fix issue #5.

applio commented 12 years ago

Yes, though I needed to make a small change in the primary openmp pragma statement for it to have any beneficial effect. The use of the 'for' directive depends upon there already being a team of threads, but none had been explicitly created (i.e. something like '#pragma omp parallel' in front of it somewhere would do the trick). Perhaps this is a subtle difference in Microsoft's implementation of OpenMP? Sites like http://bisqwit.iki.fi/story/howto/openmp/#LoopDirectiveFor claim "Note: #pragma omp for only delegates portions of the loop for different threads in the current team. A team is the group of threads executing the program. At program start, the team consists only of a single member: the master thread that runs the program."

So when I first enabled openmp for gcc, the resulting code still executed as a single thread and had the same execution time / wall time as before. Modifying the 'for' directive to read as '#pragma omp parallel for private(distance)' resulted in code that does spawn multiple threads and finishes its execution in less time.

On my Core i7-920 (4 cores) system, a 466x610 png input file that processed in ~35.5 seconds without the benefit of openmp is processed in ~8.9 seconds with openmp (and the above modified pragma statement). Very cool!

I have pushed my changes and should trigger an updated pull request momentarily.

On Wed, Feb 22, 2012 at 8:06 PM, Sahab Yazdani reply@reply.github.com wrote:

Before I merge the pull request, is there any way that OpenMP can be enabled in GCC? I only have a working knowledge of GCC, but according to this wiki article, OpenMP is available in GCC as well: http://gcc.gnu.org/wiki/openmp

That should give a significant speed boost (it does on Windows anyways).

This is awesome! Thanks so much. This will fix issue #5.


Reply to this email directly or view it on GitHub: https://github.com/thegrandpoobah/voronoi/pull/13#issuecomment-4129369

applio commented 12 years ago

Nevermind -- I guess I don't need to update the pull request. Ah, the magic of github!

thegrandpoobah commented 12 years ago

Awesome! Not sure about the OpenMP pragma discrepency, although visual studio only supports OpenMP 2 (GCC seems to support v3+) so that could be it. It definitely works without the "parallel" in VS2010, so when I get home tonight, I'll take a look to see if the change you made has any negative perf impact. Worst case scenario, I'll just #ifdef WIN32 it so that it works under both environments.

applio commented 12 years ago

Compared to my previously quoted runtimes for that example input, the wall time (with openmp and -O2) just dropped to ~1.7 seconds.

thegrandpoobah commented 12 years ago

Thanks so much for your help. Can't wait to make sure that WIN32 didn't regress and then merge this pull request.

BTW, its not documented anywhere, but if you invoke the program with the logging option (--log), you'll get performance metrics from the program itself (rather than having to rely on wall time).

applio commented 12 years ago

I found that option already and you do have it documented in that when you launch with --help it spills info on a bunch of available flags. I thought that a nice touch.

I used the loose term "wall time" to avoid tedious differences between things like "user time" or "system time" in relation to the imprecise "execution time". I've been using that '-l' flag on almost all my runs but it shows iteration times and does not display overall time to completion. For that, I've been using the convenient unix shell command 'time' which does report such. Example: time ./voronoi_stippler -s 6000 -n -z 1.45 -l input.png output.svg

thegrandpoobah commented 12 years ago

I've logged issue #14 which will add "total time" to the perf metric output. Should prove useful in the future.