luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Build failure on HEAD of develop branch #128

Closed DaGaMs closed 3 years ago

DaGaMs commented 4 years ago

I'm trying to build commit 3c393a on our cluster, using gcc 8.3.0, boost 1.73.0, gmp 6.2.0, htslib 1.10.2, cmake 3.17.3 and python 3.8.1. I get the following error:

[  9%] Building CXX object src/CMakeFiles/octopus.dir/core/models/haplotype_likelihood_model.cpp.o
In file included from /opt/gridware/depots/1a8f5697/el7/pkg/compilers/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/x86intrin.h:43,
                 from /users/bschuster/Code/octopus/lib/fmath.hpp:47,
                 from /users/bschuster/Code/octopus/src/utils/maths.hpp:30,
                 from /users/bschuster/Code/octopus/src/core/models/pairhmm/pair_hmm.hpp:26,
                 from /users/bschuster/Code/octopus/src/core/models/haplotype_likelihood_model.hpp:26,
                 from /users/bschuster/Code/octopus/src/core/models/haplotype_likelihood_model.cpp:4:
/users/bschuster/Code/octopus/src/core/models/pairhmm/sse2_pair_hmm_impl.hpp: In function ‘octopus::hmm::simd::SSE2PairHMMInstructionSet<8u, int>::do_extract<0>(long long __vector(2) const&, int)auto’:
/opt/gridware/depots/1a8f5697/el7/pkg/compilers/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/smmintrin.h:447:1: error: inlining failed in call to always_inline ‘_mm_extract_epi32(long long __vector(2), int)’: target specific option mismatch
 _mm_extract_epi32 (__m128i __X, const int __N)
 ^~~~~~~~~~~~~~~~~
In file included from /users/bschuster/Code/octopus/src/core/models/pairhmm/simd_pair_hmm_factory.hpp:10,
                 from /users/bschuster/Code/octopus/src/core/models/pairhmm/pair_hmm.hpp:27,
                 from /users/bschuster/Code/octopus/src/core/models/haplotype_likelihood_model.hpp:26,
                 from /users/bschuster/Code/octopus/src/core/models/haplotype_likelihood_model.cpp:4:
/users/bschuster/Code/octopus/src/core/models/pairhmm/sse2_pair_hmm_impl.hpp:123:33: note: called from here
         return _mm_extract_epi32(a, index);
                ~~~~~~~~~~~~~~~~~^~~~~~~~~~
/users/bschuster/Code/octopus/src/core/models/haplotype_likelihood_model.cpp: At top level:
cc1plus: error: unrecognized command line option ‘-Wno-deprecated-copy’ [-Werror]
cc1plus: all warnings being treated as errors
make[2]: *** [src/CMakeFiles/octopus.dir/core/models/haplotype_likelihood_model.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/octopus.dir/all] Error 2
make: *** [all] Error 2

Any idea what this might be about? I'm usure how to even try to debug this. I had no issues compiling 0.6.3-beta in the same way, for the record.

dancooke commented 4 years ago

Hi Ben, I can't find a commit with ID 3c393a but this looks like a compiler/library problem to me since always_inline is not used in Octopus code. I'd strongly recommend using the installation command:

$ ./scripts/install.py --dependencies

That way you can be confident you're using recent library and compiler versions, which can bring performance improvements. It's also much easier for me to resolve any installation problems since I can easily replicate.

DaGaMs commented 4 years ago

Sorry, took the wrong end of the shasum. I just meant commit d4a7f4a7fcc56cc29fbe13c4ac926964993c393a

using the installer isn't an option, unfortunately, because I'm building octopus as a reusable module and the installer puts hard-coded path for dependencies in place...

DaGaMs commented 4 years ago

I have another one for you - I realised that I compiled on a virtual machine, which meant that hte SSE4.2 detection etc didn't work. I now compile on a cluster node, and I run into this:

Scanning dependencies of target octopus
[  9%] Building CXX object src/CMakeFiles/octopus.dir/main.cpp.o
In file included from /users/bschuster/sharedscratch/octopus/src/core/models/pairhmm/simd_pair_hmm_factory.hpp:12,
                 from /users/bschuster/sharedscratch/octopus/src/core/models/pairhmm/pair_hmm.hpp:27,
                 from /users/bschuster/sharedscratch/octopus/src/core/models/haplotype_likelihood_model.hpp:26,
                 from /users/bschuster/sharedscratch/octopus/src/core/models/haplotype_likelihood_array.hpp:25,
                 from /users/bschuster/sharedscratch/octopus/src/core/callers/caller.hpp:25,
                 from /users/bschuster/sharedscratch/octopus/src/core/callers/caller_factory.hpp:11,
                 from /users/bschuster/sharedscratch/octopus/src/config/option_collation.hpp:17,
                 from /users/bschuster/sharedscratch/octopus/src/main.cpp:14:
/users/bschuster/sharedscratch/octopus/src/core/models/pairhmm/avx512_pair_hmm_impl.hpp: In static member function ‘static octopus::hmm::simd::AVX512PairHMMInstructionSet<BandSize, ScoreTp>::VectorType octopus::hmm::simd::AVX512PairHMMInstructionSet<BandSize, ScoreTp>::do_vectorise_zero_set_last(octopus::hmm::simd::AVX512PairHMMInstructionSet<BandSize, ScoreTp>::ScoreType, short int)’:
/users/bschuster/sharedscratch/octopus/src/core/models/pairhmm/avx512_pair_hmm_impl.hpp:216:40: error: there are no arguments to ‘_mm512_set_epi16’ that depend on a template parameter, so a declaration of ‘_mm512_set_epi16’ must be available [-fpermissive]
                                        _mm512_set_epi16(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0));
                                        ^~~~~~~~~~~~~~~~
/users/bschuster/sharedscratch/octopus/src/core/models/pairhmm/avx512_pair_hmm_impl.hpp:216:40: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-deprecated-copy’ [-Werror]
cc1plus: all warnings being treated as errors
make[2]: *** [src/CMakeFiles/octopus.dir/main.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/octopus.dir/all] Error 2
make: *** [all] Error 2

Any ideas?

dancooke commented 4 years ago

Does your system support AVX-512F and AVX-512BW? What's the output of grep avx /proc/cpuinfo? It looks like your compiler thinks that AVX-512 is available - so Octopus build tries to include it - but then the relevant headers aren't there. Could be a problem with how your version of GCC was built.

Regarding the dynamic linked dependencies, can you not just put the Octopus directory in a shared location and symlink the binary to where it's needed? That's how Homebrew works (package files go to /usr/local/Cellar then the binaries are symlinked to /usr/local/bin).

DaGaMs commented 4 years ago

yes, AVX-512F and AVX-512BW are definitely available. You might be right that this is an issue with this particular build of gcc-8.3.0. I tried building my own clang and then compiling octopus with that, but I ran into some C library hell at the linking stage 😩

As for why not to use the ./install.py -D: there are a few technical issues, e.g. I have to compile on a machine where I don't have access to the shared directory that the app needs to be installed to in order to be usable as a module. But also, it feels really messy to dump several GB of dependencies into a "publicly" shared location, many of which are already available on the system (boost, gmp, gcc, curl, htslib etc). So I'm not giving up yet trying to find a way to do this "properly"...

dancooke commented 4 years ago

Perhaps another avenue to explore is static linking... I've just pushed some commits that improve that. If I build using

$ ./install.py --dependencies --static

on rescomp1, I get :

$ ldd /well/gerton/dan/apps/octopus/bin/octopus                                                                                            
        linux-vdso.so.1 =>  (0x00007fff25183000)
        libdl.so.2 => /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/libdl.so.2 (0x00007fb07f891000)
        libm.so.6 => /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/libm.so.6 (0x00007fb07f790000)
        libpthread.so.0 => /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/libpthread.so.0 (0x00007fb07f770000)
        libc.so.6 => /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/libc.so.6 (0x00007fb07f4d4000)
        /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007fb07f676000)
        libmvec.so.1 => /gpfs3/well/gerton/dan/apps/octopus/build/brew/lib/libmvec.so.1 (0x00007fb07f744000)

Eliminating libc seems to be a bit of a pain, so I wonder if it's possible to use the system libraries for these remaining dynamically linked libraries...

But also, it feels really messy to dump several GB of dependencies into a "publicly" shared location, many of which are already available on the system (boost, gmp, gcc, curl, htslib etc).

Guess you're not a fan of Docker then :wink:

dancooke commented 4 years ago

Ok, I've just committed (31e163c101f80181547e7add5ca8db4aa1306e58) a change that should enable a fully static binary. Just install using:

$ .scripts/install.py --dependencies --static

and double check you get

$ ldd bin/octopus 
    not a dynamic executable

I've only tested on a CentOS 7 machine so far and can't say how the static linking will affect performance...

dancooke commented 4 years ago

Whoops, missed something - make that 78ac66b5d9ee21170121e6887c0653e7ec6efb25.

DaGaMs commented 4 years ago

It did compile ok, but when I want to run it, I get Segmentation fault no matter what... 🤨

dancooke commented 4 years ago

Hmm, what OS are you using to build? And are you using the binary on the same machine used to build?

DaGaMs commented 4 years ago

This cluster uses some flavour of Centos7. /proc/version says

Linux version 3.10.0-1062.1.2.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Mon Sep 30 14:19:46 UTC 2019

I compiled successfully, then typed ./octopus, and I get the said error. The only difference I can see here is that you seem to have built and linked with clang and ldd, whereas the install.py script seems to use gcc and ld in my case? Anyway, a quick file on the binary yields:

$ file octopus 
octopus: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped

Not sure what "statically linked (uses shared libs)" means - that seems a bit contradictory, but I assumed it was because of the libc issue.

dancooke commented 4 years ago

I'm trying to reproduce this in a Docker container but not getting anywhere:

$ docker run -t -i centos:7 /bin/bash
$ yum -y update
$ yum -y groupinstall 'Development Tools'
$ yum -y install curl file git which perl-devel python3
$ pip3 install distro
$ cd /home
$ git clone https://github.com/luntergroup/octopus.git
$ cd octopus
$ ./scripts/install.py --dependencies --static
$ file bin/octopus
bin/octopus: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, not stripped
$ bin/octopus --version
octopus version 0.7.0 (develop 9d066b94)
Target: x86_64 Linux 4.19.76-linuxkit
SIMD extension: AVX2
Compiler: GNU 10.2.0
Boost: 1_73

Everything works as expected...

If you can find a Docker image that reproduces the issue then that would be very helpful. Otherwise I would suggest trying installing again afresh as there have been some changes to the installed dependencies that could have been causing the problem:

$ git pull
$ .scripts/install.py --dependencies --static --clean