Closed ekg closed 6 years ago
The change occurred since 64a436dc.
Following these instructions: https://superuser.com/questions/885136/how-to-detect-binary-compatibility-with-the-sse4-instruction-set, I was able to dump out a list of the instruction families used in the static binary.
-> % python binary_families.py vg-620fda3d.1
These instruction families were used:
186_Base, 286_Base, 386_Base, 8086_Base, AMD_SSE5, ARM_THUMB, Base, KATMAI_Base, KATMAI_MMX, KATMAI_SSE, NEHALEM_Base, P6_Base, PENT_Base, PENT_MMX, PRESCOTT_SSE3, SANDYBRIDGE_AVX, SSE2, SSE41, SSE42, X64_Base, X64_MMX, X64_SSE, X64_SSE2, X64_SSE41
These instructions could not be categorized:
(bad), addr32, andn, cltd, cltq, cmova, cmovae, cmovb, cmovbe, cmove, cmovg, cmovge, cmovl, cmovle, cmovne, cmovns, cmovs, cqto, cvtsi2sdl, cvtsi2sdq, cvtsi2ssl, cvtsi2ssq, cwtl, data32, decb, decl, divl, divq, es, faddl, fadds, fcompl, fdivl, fdivrl, fdivrs, fildll, fistpll, fldl, flds, fldt, fmull, fmuls, fs, fstl, fstpl, fstpt, fsubl, fsubrl, fsubrs, fsubs, idivl, idivq, ja, jae, jb, jbe, je, jg, jge, jl, jle, jne, jno, jnp, jns, jo, jp, js, leaveq, ljmpq, lock, movabs, movsbl, movsbq, movsbw, movslq, movswl, movswq, movzbl, movzwl, mulb, mulq, negl, negq, nopl, nopw, notl, notq, outsl, pdep, rdrand, rep, repnz, repz, rex, rex.B, rex.R, rex.RX, rex.RXB, rex.W, rex.WB, rex.WR, rex.WRX, rex.WRXB, seta, setae, setb, setbe, sete, setg, setge, setl, setle, setne, setnp, setp, shlx, shrx, tzcnt, vcvtsi2sdq, vextracti128, vinserti128, vpbroadcastb, vpbroadcastd, vpbroadcastq, vperm2i128, vpmaskmovq
Compare this to the last working version from my system:
-> % python binary_families.py vg-64a436dc.1
These instruction families were used:
186_Base, 286_Base, 386_Base, 8086_Base, AMD_SSE5, ARM_THUMB, Base, KATMAI_Base, KATMAI_MMX, KATMAI_SSE, NEHALEM_Base, P6_Base, PENT_Base, PENT_MMX, PRESCOTT_SSE3, SANDYBRIDGE_AVX, SSE2, SSE41, SSE42, X64_Base, X64_MMX, X64_SSE, X64_SSE2, X64_SSE41
These instructions could not be categorized:
(bad), addr32, cltd, cltq, cmova, cmovae, cmovb, cmovbe, cmove, cmovg, cmovge, cmovl, cmovle, cmovne, cmovns, cmovs, cqto, cvtsi2sdl, cvtsi2sdq, cvtsi2ssl, cvtsi2ssq, cwtl, data32, decb, decl, divl, divq, es, faddl, fadds, fcompl, fdivl, fdivrl, fdivrs, fildll, fistpll, fldl, flds, fldt, fmull, fmuls, fs, fstl, fstpl, fstpt, fsubl, fsubrl, fsubrs, fsubs, idivl, idivq, ja, jae, jb, jbe, je, jg, jge, jl, jle, jne, jno, jnp, jns, jo, jp, js, leaveq, ljmpq, lock, movabs, movsbl, movsbq, movsbw, movslq, movswl, movswq, movzbl, movzwl, mulb, mulq, negl, negq, nopl, nopw, notl, notq, outsl, rdrand, rep, repnz, repz, rex, rex.B, rex.R, rex.RXB, rex.W, rex.WB, rex.WR, rex.WRX, rex.WRXB, seta, setae, setb, setbe, sete, setg, setge, setl, setle, setne, setnp, setp, tzcnt, vpbroadcastb
So it isn't necessarily the 4.2 set, but it surprises me that that's there because this version worked on systems that did not use SSE4.2. They must not have hit that set of instructions in the code path I was running.
These seem to be new though:
vcvtsi2sdq, vextracti128, vinserti128, vpbroadcastd, vpbroadcastq, vperm2i128, vpmaskmovq
I think they are coming from gcsa2. I checked the static libs using objdump -d:
-> % for s in $(ls lib/*a); do echo $s ; objdump -d $s | grep vpmaskmovq; done
lib/lib3edgeconnected.a
lib/libdivsufsort64.a
lib/libdivsufsort.a
lib/libfml.a
lib/libgbwt.a
lib/libgcsa2.a
189: c4 c2 fd 8c 1e vpmaskmovq (%r14),%ymm0,%ymm3
18e: c4 e2 fd 8e 18 vpmaskmovq %ymm3,%ymm0,(%rax)
1fb: c4 42 8d 8c 3e vpmaskmovq (%r14),%ymm14,%ymm15
200: c4 62 8d 8e 38 vpmaskmovq %ymm15,%ymm14,(%rax)
3de: c4 42 9d 8c 2e vpmaskmovq (%r14),%ymm12,%ymm13
3e3: c4 62 9d 8e 2e vpmaskmovq %ymm13,%ymm12,(%rsi)
449: c4 62 f5 8c 00 vpmaskmovq (%rax),%ymm1,%ymm8
44e: c4 62 f5 8e 06 vpmaskmovq %ymm8,%ymm1,(%rsi)
bf0: c4 c2 d5 8c 36 vpmaskmovq (%r14),%ymm5,%ymm6
bf5: c4 e2 d5 8e 30 vpmaskmovq %ymm6,%ymm5,(%rax)
bff: c4 62 cd 8c 10 vpmaskmovq (%rax),%ymm6,%ymm10
c04: c4 62 cd 8e 16 vpmaskmovq %ymm10,%ymm6,(%rsi)
...
Of course they're also ending up in libvg.a.
vinserti128 etc seem AVX not SSE?
I think that this is being stamped on by the settings in the GSCA2 make file. This plus depending on which cpu and CC you compile on means you might need to set -msse4.2 and -mno-avx
@adamnovak why did we add -ldl
in 3a4e1ba? Not sure this is the problem, I'm just trying to figure out what changes there have been to the Makefile since the last working version.
@JervenBolleman nothing seems to have changed in gcsa2 since the last version I was able to build in a portable way. However, this bit was changed in the vg Makefile in c1f4b85c4:
- +. ./source_me.sh && cd $(GCSA2_DIR) && cat Makefile | grep -v VERBOSE_STATUS_INFO >Makefile.quiet && $(MAKE) -f Makefile.quiet libgcsa2.a $(FILTER) && mv libgcsa2.a $(CWD)/$(LIB_DIR) && cp -r include/gcsa $(CWD)/$(INC_DIR)/
+ +. ./source_me.sh && cd $(GCSA2_DIR) && cat Makefile | grep -v VERBOSE_STATUS_INFO >Makefile.quiet && AS_INTEGRATED_ASSEMBLER=1 $(MAKE) -f Makefile.quiet libgcsa2.a $(FILTER) && mv libgcsa2.a $(CWD)/$(LIB_DIR) && cp -r include/gcsa $(CWD)/$(INC_DIR)/
Now I'm trying to grok what AS_INTEGRATED_ASSEMBLER=1
does.
I guess this is coming from SDSL. They added -march=native
to the default compile flags in the summer.
It looks like this is the problem (in sdsl/Make.helper, which gcsa2 is pulling in):
MY_CXX_FLAGS= -std=c++11 -march=native -Wall -Wextra -DNDEBUG $(CODE_COVER)
edit: Ah, I just caught @jltsiren's comment.
@jltsiren I believe I resolved this here: https://github.com/simongog/sdsl-lite/pull/387
But I didn't catch the Make.helper bit. Could that be the problem?
This makes it seem that my portable builds (on a remote VM) were a fluke due to the architecture of the host, and not something that had to do with the system libraries. So I hadn't really solved this problem.
The -ldl
is for some functions for inspecting dynamically-linked libraries. I needed it for some stack-tracing code I had to add to debug segfaults on Mac Travis that I couldn't reproduce locally.
I think it's really only needed on OS X the way I have the #ifdef
guards set up right now, but it's in the list for both platforms.
As of 620fda3d I am no longer able to build a portable binary from vg.
Is there any recent dependency or change to the build which could have added SSE instructions from the 4.2 set? I'm trying to figure out which instruction it is.