myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

x64 is nearly 10x slower than x86 on Windows #311

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Compile using all defaults as per GettingStarted Wiki for both x86 and x64
2. Run Benchmarking as per GettingStarted Wiki for both x86 and x64
3. Compare results

What is the expected output? What do you see instead?

x86 Note: Google Test filter = *I420ToARGB_Opt
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.I420ToARGB_Opt
[       OK ] libyuvTest.I420ToARGB_Opt (458 ms)      <====== looks good
[----------] 1 test from libyuvTest (460 ms total)

x64 Note: Google Test filter = *I420ToARGB_Opt
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.I420ToARGB_Opt
[       OK ] libyuvTest.I420ToARGB_Opt (4228 ms)     <====== nearly 10x slower
[----------] 1 test from libyuvTest (4230 ms total)

What version of the product are you using? On what operating system?

Trunk as of 2/4/2014.  Win8.1x64.

Please provide any additional information below.

Looking at the generated project in Visual Studio, it appears the appropriate 
defines are not being set in x64.  In row.h, all the "Effects:" defines (source 
code line 55) are not defined.

Original issue reported on code.google.com by mpiet...@revation.com on 4 Feb 2014 at 4:38

GoogleCodeExporter commented 9 years ago
This is a known issue.  Visual Studio does not allow 64 bit assembly.
This is still true as os VS2012, and I think VS2013.
Short term, alternative compilers seem most feasible.  I'm aware of 2 visual c 
compatible compilers - clang-cl and icl.  We've testing/fixed clang-cl for 32 
bit, but not tested the 64 bit version.
For Web Apps there is 64 bit NaCL on windows, which uses a variation of gcc.

The other 2 alternatives are substantial work
-convert to intrinsics
-convert to yasm

Original comment by fbarch...@google.com on 5 Feb 2014 at 1:31

GoogleCodeExporter commented 9 years ago
Thanks for the response.  I'm sure you've thought about this but I thought it 
was worth asking... is there a way to cross-compile just the x64 assembly 
module(s) into Windows objects using a Linux distro?  That is, compile/assemble 
the inline asm on Linux, compile all pure C/C++ on Windows, and do final 
linking on Windows...

Original comment by mpiet...@revation.com on 6 Feb 2014 at 1:45

GoogleCodeExporter commented 9 years ago
I'm told the answer is yes, but I've had mixed success with this in the past.
Most of libyuv is C, not C++, and I'll likely rename the source to .c files, 
which may help with this endevour.

linux also has a cross compiler, but the one I use is literally gcc, with no 
g++, so that may require the full c port be complete.

The way I've done it in the past, is build with mingw into a DLL, and link that 
in dynamically.  I havent tried it for libyuv, but that will work.
The normal visual c build does support DLLs.

Original comment by fbarch...@chromium.org on 24 Feb 2014 at 7:28

GoogleCodeExporter commented 9 years ago
For what it's worth, I was able to cross-compile the row_posix file in Linux to 
a Windows x64 target and link that in with my x64 Windows object in Visual 
Studio OK.

Original comment by mpiet...@revation.com on 27 Feb 2014 at 2:31

GoogleCodeExporter commented 9 years ago
Good to know.
So you compiled just row_posix.cc with gcc and replaced row_win.obj with 
row_posix.obj and thats it?
Or did you do the entire library?

I would think row_posix.cc would only work if it contains an identical set of 
functions, which it doesn't - row_posix doesn't have all the AVX2 functions yet.
Did you have to change row.h?

If you compile the entire library with gcc, you might run into linux specific 
code in cpu_id.cc.  But I think it might work, since it uses instructions, not 
/proc/cpuinfo.

Original comment by fbarch...@chromium.org on 26 Mar 2014 at 9:26

GoogleCodeExporter commented 9 years ago
Well, I admit to not doing the entire library but just the portion I needed - 
specifically, libyuv::RGB24ToI420 using {convert.cc cpu_id.cc row_any.cc 
row_common.cc row_win.cc} and only using the linux-compiled row_posix.obj and 
defining __x86_64__ for x64 builds.  To your point, I had to stub out a few 
methods in the libyuv namespace and a SSSE3 function in the global namespace 
for x64 to get it all compiling/linking okay in both x86 and x64.  I did not 
change any source or header files though.  Pretty, it was not.  Faster, it was.

Original comment by mpiet...@revation.com on 26 Mar 2014 at 9:50

GoogleCodeExporter commented 9 years ago
r1018 ports I420ToARGB to x64 Visual C.

Original comment by fbarch...@chromium.org on 24 Jun 2014 at 9:24

GoogleCodeExporter commented 9 years ago
Another option may be clang-cl, which mimics Visual C, but allows 64 bit.
Current version (r1033) supports clang-cl, but assembly is turned off because 
its not fully compatible, and it gets confused by a compiler that claims to be 
both gcc and visual C, and builds both.

Original comment by fbarch...@chromium.org on 12 Jul 2014 at 2:13

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
r1341 clang-cl now working.  64 bit is same performance as 32 bit, using gcc 
syntax assembly.
Some assembly hasnt been ported from visual c to gcc syntax, so some follow up 
optimization is needed, but I'll call this the solution to win64 performance, 
at least for now.

Haswell
2216252 ms 32 bit llvm
1595304 ms 32 bit visual c
2121032 ms 64 bit llvm
8464078 ms 64 bit visual c

Sandy Bridge
1149644 ms 32 bit llvm
826008 ms  32 bit visual c
1033494 ms 64 bit llvm
2197730 ms 64 bit visual c

Original comment by fbarch...@chromium.org on 24 Mar 2015 at 9:11

GoogleCodeExporter commented 9 years ago
Do all compilers support the same simd intrinsics?  Could, long term, all code 
be ported to intrinsics, so the code is shared?

Original comment by bruno...@gmail.com on 24 Mar 2015 at 9:49