Fix missing SSE detection on x64 targets. Fixes #25

cdwfs commented 7 years ago

One important note on this PR: with this change, matrix_benchmarks takes twice as long to run on my test system with SIMD enabled as it does with SIMD disabled. That seems... unintuitive. And worth investigating further.

googlebot commented 7 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

cdwfs commented 7 years ago

Re the CLA: I'm a Google employee. I just registered my GitHub account, though, so approval may still be processing.

googlebot commented 7 years ago

CLAs look good, thanks!

ghost commented 7 years ago

To be precise: It takes twice as long to run for x64, and we don't know how much slower it was before, since on x64 it wasn't using SIMD at all, right?

This change seems otherwise ok to me, it is not like anything in this PR is causing a slowdown by itself, so I think it should go in. Finding out the slowdown is a separate issue.

cdwfs commented 7 years ago

To be precise: It takes twice as long to run for x64, and we don't know how much slower it was before, since on x64 it wasn't using SIMD at all, right?

Not quite, no. You're correct that x64 was previously not using SIMD at all, whether it was enabled in the CMAKE options or not. What I meant was that now that x64 builds can use SIMD, enabling it actually slows down the matrix_benchmarks by a factor of two vs. the SIMD-disabled x64 configuration. My concern is that if this PR is merged as-is without identifying the cause of the slowdown, existing mathfu users who think they've had SIMD enabled all this time will suddenly see a pretty significant performance drop after integrating these changes.

johnb003 commented 6 years ago

I think the fpu <-> Mem <-> simd conversion is likely the biggest culprit, which can implicitly happen with the current implementation, and be tricky to spot. As for copy constructors and temp objects, the compiler does a pretty good job of dealing with this. https://en.wikipedia.org/wiki/Copy_elision

johnb003 commented 6 years ago

Oh, I just noticed you did indicate it was in the benchmark samples, that you saw the slowdown. What system did you run the benchmarks on?

cdwfs commented 6 years ago

Lenovo ThinkPad P50 laptop.

Before applying this PR:

Running matrix benchmark ([no simd] [no padding])...
Took 107.934558 seconds

After applying this PR:

Running matrix benchmark ([simd] [padding])...
Took 218.015356 seconds

google / mathfu

Fix missing SSE detection on x64 targets. Fixes #25 #26