minio / c2goasm

C to Go Assembly
Apache License 2.0
1.31k stars 110 forks source link

Address alignment in go #4

Closed sakjain92 closed 6 years ago

sakjain92 commented 6 years ago

So I am using AVX instructions (basically using MultiplyAndAdd.cpp). I modified the code of .cpp a bit, and following is the new and the old (unmodified code) snippet of MultiplyAndAdd.s files:

New:

    vmovaps ymm0, YMMWORD PTR [rdi]
    vmovaps ymm1, YMMWORD PTR [rdx]
    vfmadd132ps     ymm0, ymm1, YMMWORD PTR [rsi]
    vmovups YMMWORD PTR [rcx], ymm0
    vzeroupper
    ret

Old:

push    rbp
mov rbp, rsp
vmovups ymm0, ymmword ptr [rdi]
vmovups ymm1, ymmword ptr [rsi]
vfmadd213ps ymm1, ymm0, ymmword ptr [rdx]
vmovups ymmword ptr [rcx], ymm1
pop rbp
vzeroupper
ret

As you can see, old code is using "vmovups", which allows unaligned address, but new code is using "vmovaps", which requires aligned address (256 bits aligned as this is AVX). Go seems to only allow upto 128bit alignment, when using complex128, so we can never be guaranteed of 256 bits alignment. I am assuming "vmovups" will be slower than "vmovaps".

So, the questions is, is there a way to get 256 bit alignment in go?

Also, there seems to be an issue. When I am using c2goasm/test/cpp/assembler.sh, the .s file that I get is not compatible with c2goasm. The reason is that the assembler is c++ assembler so it mangles the function name, so c2goasm is not able to find the corresponding function declaration in .go file. So I have to use extern "C" {} in .cpp file to ensure the function names are not mangled.

So another question is, are the .s files (in c2goasm/test/cpp/ folder) hand crafted and not created using assembler?

fwessels commented 6 years ago

Unfortunately explicit alignment is a bit of a problem in Golang. One possible way (but ugly solution) is to allocate more memory and "round up" the pointer to the next multiple of the address alignment that you want.

What C++ compiler are you using? We have been running it under OSX:

$ c++ -v
Apple LLVM version 9.0.0 (clang-900.0.37)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin