soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.39k stars 195 forks source link

FreeBSD port committed #460

Closed outpaddling closed 3 years ago

outpaddling commented 3 years ago

FYI:

MMseqs2 has been committed to the FreeBSD ports collection. It might be helpful to users if you could post a message like the following on your website:

Thanks!

MMseqs2 can be installed on FreeBSD via the FreeBSD ports system.

To install via the binary package, simply run:

pkg install MMseqs2

This will very quickly install a prebuilt binary using only highly-portable optimizations, much like apt, yum, etc.

FreeBSD ports can just as easily be built and installed from source, although it will take longer (for the computer, not for you):

cd /usr/ports/biology/mmseqs2
make install

Building from source allows installing to a different prefix, compiling with native optimizations, and in some cases, building with non-default options such as different compilers or dependencies. For example, adding

CFLAGS+=-march=native

to /etc/make.conf will cause ports built from source to use all native optimizations known to the compiler for the local CPU, resulting in faster but less portable binaries.

To report issues with a FreeBSD port, please submit a PR at:

https://www.freebsd.org/support/bugreports.html

For more information, visit https://www.freebsd.org/ports/index.html.

milot-mirdita commented 3 years ago

Thanks a lot for your work!

I looked though this repository and found these things that might need to be slightly tweaked. In https://github.com/outpaddling/freebsd-ports-wip/blob/master/mmseqs2/Makefile:

I am not sure what to think of the arch patch, if you don't set any of the -DHAVE_* parameters, they are not used anyway and the automatic detection can be disabled by setting -DMMSEQS_ARCH=" " or something like that. I would suggest to drop that patch.

Does FreeBSD not have any baseline requirements (i.e. Debian has SSE2 as baseline)? I would be happy if at very least SSE2 would be enabled by default on x86_64.

-march also doesn't work very well on some non x86 architectures, some require -mcpu to work correctly.

Are 32-bit builds disabled? MMseqs2 currently produces incorrect results on 32-bit systems (see #418, we will probably eventually deal with this to support webassembly fully).

Would it be possible to run the small subset of test pipeline, that is part of the release on Github (i.e.: https://github.com/soedinglab/MMseqs2/releases/download/13-45111/MMseqs2-Regression.zip)? This would ensure that MMseqs2 on FreeBSD produces correct results. I was looking for a free CI service that supports *BSD previously, but couldn't find any.

outpaddling commented 3 years ago

Thanks for the quick and detailed feedback! Partial answer: 1) awk, zlib, bzip2, and omp are included in the FreeBSD base, so no package dependency needed. 2) Generally, FreeBSD ports respect the user's env regarding build options and of course the binary package has to be pessimistic about hardware. I'll check on the baseline assumptions and what clang -O2 emits, though. I was also thinking of adding a package message suggesting that it be built from source with more agressive optimizations to get better performance. That's trivial to do with FreeBSD ports. I wanted to sneak the commit in before the quarterly branch coming next week so it's at least available in the next quarterly package set, I haven't put much effort into perfecting it yet. With your feedback I should be able to make some good improvements by then. 3) Do you actually plan to continue support for 32-bit platforms? It's disabled for many bioinformatics ports already. I'll look into the rest of your comments ASAP. BTW, this is the first time in my lengthy career I've ported a C++/cmake project to any platform and got a build with zero warnings on the first try. Somebody on your end is doing some good work. ;-)

milot-mirdita commented 3 years ago

Thanks for the kind words :) It helps that macOS is quite close to BSD and I also tried compiling on BSD a while ago (9e103bc79494e973cdcba47dd0099b9b9a565d66) and it luckily didn't break in the meantime again.

I see that CirrusCI seems to have BSD support, so we might start using that soonish.

Regarding 32bit: I don't really want to support it, but if I want webassembly support, then 32bit has to work. However, right now its subtly broken and should not be used.

outpaddling commented 3 years ago

Without the CMakeLists patch, I get the following on aarch64: cc: error: the clang compiler does not support '-march=native' At any rate, when restricting an upstream build system, it's generally better to use a blunt-force patch. A more finessed approach like -DMMSEQS_ARCH can be fragile. Suppose you decide to change the name of the variable between now and the next release. This patch is then silently rendered inert. If I overlook the change, which is easy to do, the builds introduce agressive optimizations into the binary package, causing illegal instruction dumps for users with lower-end hardware than the build cluster. The static patch, on the otherhand, will break on a variable name chnage so I'll be alerted that it needs attention.

For wget and curl, are you referring to createtaxdb.sh and databases.sh? Are these supposed to be installed? The cmake build system only installs mmseqs and bash-conmpletion.sh. Rather than add another dependency, I would add FreeBSD's native fetch command as the final fallback option as follows:

--- data/workflow/databases.sh.orig     2021-06-25 01:34:08 UTC
+++ data/workflow/databases.sh
@@ -27,6 +27,8 @@ STRATEGY=""
 if hasCommand aria2c; then STRATEGY="$STRATEGY ARIA"; fi
 if hasCommand curl;   then STRATEGY="$STRATEGY CURL"; fi
 if hasCommand wget;   then STRATEGY="$STRATEGY WGET"; fi
+# Part of FreeBSD base, need not be installed separately
+if hasCommand fetch;  then STRATEGY="$STRATEGY FETCH"; fi
 if [ "$STRATEGY" = "" ]; then
     fail "No download tool found in PATH. Please install aria2c, curl or wget."
 fi
@@ -47,6 +49,9 @@ downloadFile() {
             ;;
         WGET)
             wget -O "$OUTPUT" "$URL" && return 0
+            ;;
+        FETCH)
+            fetch -o "$OUTPUT" "$URL" && return 0
             ;;
         esac
     done

Most bioinformaticians will have curl and/or wget installed anyway (both are included in the biostar-tools metapoirt), so it won't come into play, but we try to minimize unnecessary requirements.

I haven't been able to find a specify CPU feature requirement for FreeBSD, but I think SSE2 is safe to assume for amd64, but I also added a pkg-message suggesting an optimized build from source. How much performance gain do you typically see from SSE4 or AVX?

Thanks...

milot-mirdita commented 3 years ago

You are right that it might be a bit fragile.

Exactly, these two workflows need to have something to download files. The workflows are automatically compiled into the binary and executed when the respective workflow is called (that's what either the xxd or perl build time dependency is for). Fetch sounds good, I'll try that out when I get a CirrusCI with FreeBSD going.

AVX2 is a bit IIRC ~30% faster than SSE4.1, so it's not super important. The only problem is if no SIMD flags are specified at all. Then we fall back to the scalar intrinsic implementations of SIMDe which are a lot slower (don't have an exact number, but it was a few factors slower).

outpaddling commented 3 years ago

Got it. So what files should be present in a minimal installation? For now, I'm installing everything in data in addition to the two files your install target does. I haven't had time to play with mmseqs2 yet and probably won't for a while, but I want to the FreeBSD port to work out-of-the-box for a typical user.

milot-mirdita commented 3 years ago

You don't really need the data directory. The .out files for the substitution matrices are the only somewhat useful files there. cmake deals with the shell scripts in data/{workflow,resources}, they are only needed during build time, not runtime.

outpaddling commented 3 years ago

So the workflow scripts don't need to be accessible at runtime? I wasn't sure what you meant by "compiled in".

Thanks...

milot-mirdita commented 3 years ago

Exactly, they are embedded into the binary and written to disk and executed when needed. We tried to take care that the scripts don't have any BASHisms and only call standard POSIX tools, awk and one of wget/curl/aria2c.

outpaddling commented 3 years ago

When built with just sse2 or sse3, I'm seeing hangs. No cpu, disk, or network activity. Deadlock?

Built with -march=native, it seems to mostly work:

SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353

EASY_SEARCH (Time: 21s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353

EASY_SEARCH_INDEX_SPLIT (Time: 23s) TEST SUCCESS GOOD Expected: 0.118265 Actual: 0.118265

PROFILE (Time: 50s) TEST SUCCESS GOOD Expected: 0.367396 Actual: 0.367396

EASY_PROFILE (Time: 37s) TEST SUCCESS GOOD Expected: 0.33876 Actual: 0.338768

SLICEPROFILE (Time: 22s) TEST FAILED (NO REPORT)

DBPROFILE (Time: 18s) TEST SUCCESS GOOD Expected: 0.182017 Actual: 0.182017

EXPAND (Time: 50s) TEST SUCCESS GOOD Expected: 0.16614, 0.171723, 0.222982 Actual: 0.16614, 0.224627, 0.222982

NUCLPROT_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: 0.238076 Actual: 0.238076

NUCLNUCL_SEARCH (Time: 21s) TEST FAILED (NO REPORT)

NUCLNUCL_TRANS_SEARCH (Time: 14s) TEST FAILED (NO REPORT)

CLUSTER (Time: 14s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722

EASY_CLUSTER (Time: 14s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722

EASY_NUCL_CLUSTER (Time: 3s) TEST SUCCESS GOOD Expected: 106 Actual: 106

CLUSTER_REASSIGN (Time: 12s) TEST SUCCESS GOOD Expected: 17403 Actual: 17403

LINCLUST (Time: 4s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491

LINCLUST_SPLIT (Time: 8s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491

EASY_LINCLUST (Time: 4s) TEST SUCCESS GOOD Expected: 26493 Actual: 26493

CLUSTHASH (Time: 0s) TEST SUCCESS GOOD Expected: 5 Actual Prot: 5 Nucl: 5

PROTNUCL_SEARCH (Time: 27s) TEST SUCCESS GOOD Expected: 0.237504 Actual: 0.237504

NUCLPROTTAX_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265 Actual: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265

EASYNUCLPROTSEARCH_TAX (Time: 26s) TEST SUCCESS GOOD Expected: from filtertaxdb: 3626 680 1425; Actual: from filtertaxdb: 3626 680 1425;

DBPROFILE_INDEX (Time: 50s) TEST SUCCESS GOOD Expected: 0.197552 Actual: 0.197552

LINSEARCH_NUCLNUCL_TARNS_SEARCH (Time: 15s) TEST FAILED (NO REPORT)

LINSEARCH_NUCLNUCL_SEARCH (Time: 20s) TEST FAILED (NO REPORT)

EASY_LINSEARCH_NUCLNUCL_SEARCH_SPLIT (Time: 29s) TEST SUCCESS GOOD Expected: 0.108903 Actual: 0.108903

LINCLUST_UPDATE (Time: 19s) TEST SUCCESS GOOD Expected: 32132 24732 32132 Actual: 32132 24732 32132

EASYNUCLNUCLTAX_SEARCH (Time: 55s) TEST SUCCESS GOOD Expected: from taxonomyreport: 2607 243 2624 Actual: from taxonomyreport: 2607 243 2624

EXTRACTORFS (Time: 0s) TEST SUCCESS GOOD Expected: 0 Actual: 0

RBH (Time: 5s) TEST SUCCESS GOOD Expected: 10 Actual: 10

APPLY (Time: 1s) TEST SUCCESS GOOD Expected: 2570583 Actual: 2570583

INDEX_COMPATIBLE (Time: 7s) TEST SUCCESS GOOD Expected: 0 Actual: 0

FILTERDB (Time: 1s) TEST SUCCESS GOOD Expected: 0 Actual: 0

PREF_DB_LOAD_MODE (Time: 18s) TEST SUCCESS GOOD Expected: 0.0856974 Actual: 0.0856974

FILTERTAXSEQDB (Time: 1s) TEST SUCCESS GOOD Expected: 0,1,2 0,1,2,3,4,5 3,4,5 Actual: 0,1,2 0,1,2,3,4,5 3,4,5

NOMPI_TARGET_SPLIT (Time: 8s) TEST SUCCESS GOOD Expected: 500 Actual: 500

NOMPI_SLICE_TECH (Time: 13s) TEST SUCCESS GOOD Expected: 512,256,128,64,32 Actual: 512,256,128,64,32

In what little output didn't scroll away, I see a couple of messages like this:

posix_madvise returned an error /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/NOMPI_SLICE_TECH/DSL_17K_SPLIT_MODE_1/tmpFolder/4754201167969432722/pref

milot-mirdita commented 3 years ago

Ah the nucl-nucl test are very likely to fail as they include statically compiled binaries for samtools (for linux, windows and macos). I would need to add a statically compiled binary for freebsd too. SLICEPROFILE is odd however.

The posix_madvice error is annoying but can be ignored. I don't see anything special in the freebsd man page that would indicate that this might fail/behave differently.

I am downloading a VM and I'll take a look at these issues.

outpaddling commented 3 years ago

Can it use an external samtools?

outpaddling commented 3 years ago

And FYI, SLICEPROFILE succeeded on my 4-core Phenom workstation. The failure occurred on a 16C/32T PowerEdge server. Full output from another run with |& tee log attached.

log.gz

milot-mirdita commented 3 years ago

Replacing the util/regression/samtools/samtools.sh with the following works:

#!/bin/sh -e
SELF="$( cd "$(dirname "$0")" ; pwd -P )"
SUFFIX=""
case "$(uname -m)" in
  arm*|aarch*) SUFFIX="-aarch64" ;;
  ppc*) SUFFIX="-ppc64le" ;;
esac
case "$(echo "$OSTYPE" | tr '[:upper:]' '[:lower:]')" in
  linux*) exec "$SELF/samtools-linux$SUFFIX" "$@" ;;
  darwin*) exec "$SELF/samtools-darwin" "$@" ;;
  msys*|cygwin*) exec "$SELF/samtools-windows" "$@" ;;
esac
samtools "$@"

I don't want to push that commit that right now since we are in the process of some bigger refactoring.

SLICEPROFILE is a bit fickle about RAM available per core used. That seems to have been the problem:

[=======mem_align could not allocate memory.

I also fixed the bogus error messages in https://github.com/soedinglab/MMseqs2/commit/15ace29a276be54fee6b9aedd7a1e814a3c7769b

outpaddling commented 3 years ago

If I compile with GCC 10 (just make USE_GCC=yes in FreeBSD ports), it does not hang with only SSE. I'd hate to add such a heavy dependency to the port, though, so it would be good to figure out what's going wrong with clang 10.

The PowerEdge has 32 hyperthreads and 64G RAM. Is 2G/thread not enough for SLICEPROFILE? If not, how can I limit the number of threads in the regression tests? It doesn't seem to respect OMP_NUM_THREADS=16. I still see CPU spike to > 3000%.

I'll test your patches later.

Thanks...

outpaddling commented 3 years ago

With your patches and building with GCC + SSE3 only, all tests pass except for SLICEPROFILE:

SEARCH (Time: 16s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353

EASY_SEARCH (Time: 20s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353

EASY_SEARCH_INDEX_SPLIT (Time: 21s) TEST SUCCESS GOOD Expected: 0.118265 Actual: 0.118265

PROFILE (Time: 52s) TEST SUCCESS GOOD Expected: 0.367396 Actual: 0.367396

EASY_PROFILE (Time: 42s) TEST SUCCESS GOOD Expected: 0.33876 Actual: 0.338768

SLICEPROFILE (Time: 22s) TEST FAILED (NO REPORT)

DBPROFILE (Time: 17s) TEST SUCCESS GOOD Expected: 0.182017 Actual: 0.182017

EXPAND (Time: 52s) TEST SUCCESS GOOD Expected: 0.16614, 0.171723, 0.222982 Actual: 0.16614, 0.224627, 0.222982

NUCLPROT_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: 0.238076 Actual: 0.238076

NUCLNUCL_SEARCH (Time: 31s) TEST SUCCESS GOOD Expected: 0.192043 Actual: 0.192043

NUCLNUCL_TRANS_SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.11646 Actual: 0.11646

CLUSTER (Time: 17s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722

EASY_CLUSTER (Time: 19s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722

EASY_NUCL_CLUSTER (Time: 3s) TEST SUCCESS GOOD Expected: 106 Actual: 106

CLUSTER_REASSIGN (Time: 14s) TEST SUCCESS GOOD Expected: 17403 Actual: 17403

LINCLUST (Time: 7s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491

LINCLUST_SPLIT (Time: 10s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491

EASY_LINCLUST (Time: 12s) TEST SUCCESS GOOD Expected: 26493 Actual: 26493

CLUSTHASH (Time: 0s) TEST SUCCESS GOOD Expected: 5 Actual Prot: 5 Nucl: 5

PROTNUCL_SEARCH (Time: 28s) TEST SUCCESS GOOD Expected: 0.237504 Actual: 0.237504

NUCLPROTTAX_SEARCH (Time: 13s) TEST SUCCESS GOOD Expected: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265 Actual: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265

EASYNUCLPROTSEARCH_TAX (Time: 25s) TEST SUCCESS GOOD Expected: from filtertaxdb: 3626 680 1425; Actual: from filtertaxdb: 3626 680 1425;

DBPROFILE_INDEX (Time: 50s) TEST SUCCESS GOOD Expected: 0.197552 Actual: 0.197552

LINSEARCH_NUCLNUCL_TARNS_SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.0620599 Actual: 0.0620599

LINSEARCH_NUCLNUCL_SEARCH (Time: 35s) TEST SUCCESS GOOD Expected: 0.108522 Actual: 0.108522

EASY_LINSEARCH_NUCLNUCL_SEARCH_SPLIT (Time: 49s) TEST SUCCESS GOOD Expected: 0.108903 Actual: 0.108903

LINCLUST_UPDATE (Time: 19s) TEST SUCCESS GOOD Expected: 32132 24732 32132 Actual: 32132 24732 32132

EASYNUCLNUCLTAX_SEARCH (Time: 99s) TEST SUCCESS GOOD Expected: from taxonomyreport: 2607 243 2624 Actual: from taxonomyreport: 2607 243 2624

EXTRACTORFS (Time: 0s) TEST SUCCESS GOOD Expected: 0 Actual: 0

RBH (Time: 4s) TEST SUCCESS GOOD Expected: 10 Actual: 10

APPLY (Time: 2s) TEST SUCCESS GOOD Expected: 2570583 Actual: 2570583

INDEX_COMPATIBLE (Time: 6s) TEST SUCCESS GOOD Expected: 0 Actual: 0

FILTERDB (Time: 1s) TEST SUCCESS GOOD Expected: 0 Actual: 0

PREF_DB_LOAD_MODE (Time: 18s) TEST SUCCESS GOOD Expected: 0.0856974 Actual: 0.0856974

FILTERTAXSEQDB (Time: 0s) TEST SUCCESS GOOD Expected: 0,1,2 0,1,2,3,4,5 3,4,5 Actual: 0,1,2 0,1,2,3,4,5 3,4,5

NOMPI_TARGET_SPLIT (Time: 9s) TEST SUCCESS GOOD Expected: 500 Actual: 500

NOMPI_SLICE_TECH (Time: 14s) TEST SUCCESS GOOD Expected: 512,256,128,64,32 Actual: 512,256,128,64,32

milot-mirdita commented 3 years ago

GCC should not be necessary. Clang works fully in all kinds of configurations. We introduced our own env variable MMSEQS_NUM_THREADS to limit threads globally if --threads cannot be (conveniently) set. I limited my VM to 2GB and couldn't get it to crash in the same way during SLICEPROFILE (the OOM killer always killed it, it didn't crash in posix_memalign).

How exactly did you compile it when it hang with SSE?

outpaddling commented 3 years ago

All tests pass on my PowerEdge 32-ht system with MMSEQS_NUM_THREADS=16.

Also, I had forgotten that this machine has vmem limits, so I was actually running with about 1G/thread before.

Exceprt from build:

[ 3% 10/228] /usr/bin/c++ -I/usr/local/include -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/tinyexpr -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/microtar -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/simde -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/simd -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/gzstream -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/alp -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/cacode -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/ksw2 -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/xxhash -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/ips4o -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -fsigned-char -D_WITH_GETLINE -std=c++1y -stdlib=libc++ -MD -MT lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o -MF lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o.d -o lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o -c /usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/cacode/lambda_calculator.cpp

...

[100% 228/228] : && /usr/bin/c++ -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -lpthread -fstack-protector-strong -fsigned-char -D_WITH_GETLINE -std=c++1y -stdlib=libc++ -pedantic -Wall -Wextra -Wdisabled-optimization -fno-exceptions -fopenmp=libomp src/CMakeFiles/mmseqs.dir/mmseqs.cpp.o -o src/mmseqs src/libmmseqs-framework.a src/version/libversion.a lib/tinyexpr/libtinyexpr.a -lm /usr/local/lib/libzstd.a lib/microtar/libmicrotar.a -lz -lbz2 -lomp && :

outpaddling commented 3 years ago

The hangs are not totally consistent, but seem to usually happen here: ... Create directory /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp easy-search /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/results_aln.m8 /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp -e 10000 -s 4 --max-seqs 4000 --num-iterations 2 --compressed 1

MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Add backtrace false Alignment mode 3 Alignment mode 0 Allow wrapped scoring false E-value threshold 10000 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Gap open cost nucl:5,aa:11 Gap extension cost nucl:2,aa:1 Zdrop 40 Threads 16 Compressed 1 Verbosity 3 Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 4 k-mer length 0 k-score 2147483647 Alphabet size nucl:5,aa:21 Max results per query 4000 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 15 Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 2 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files true Alignment format 0 Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits Database output false Overlap threshold 0 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 0 Greedy best hits false

createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp/11635372687271654297/query --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 1 -v 3

Converting sequences [6364] 1s 293ms Time for merging to query_h: 0h 0m 0s 19ms Time for merging to query: 0h 0m 0s 23ms Database type: Aminoacid Time for processing: 0h 0m 1s 363ms createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp/11635372687271654297/target --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 1 -v 3

Converting sequences [35957] 9s 707ms Time for merging to target_h: 0h 0m 0s 21ms Time for merging to target: 0h 0m 0s 33ms Database type: Aminoacid

outpaddling commented 3 years ago

Next most likely hang point is here: ... [ 50%] Building CXX object CMakeFiles/evaluate_results.dir/src/EvaluateResults.cpp.o [100%] Linking CXX executable evaluate_results [100%] Built target evaluate_results createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/SEARCH/query

MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3

Converting sequences [6364] 0s 83ms Time for merging to query_h: 0h 0m 0s 12ms Time for merging to query: 0h 0m 0s 16ms Database type: Aminoacid Time for processing: 0h 0m 0s 136ms createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/SEARCH/targetannotation

MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3

Converting sequences [35957] 0s 127ms Time for merging to targetannotation_h: 0h 0m 0s 20ms Time for merging to targetannotation: 0h 0m 0s 43ms Database type: Aminoacid

I have seen 1 or 2 hangs where "Database type: Aminoacid" was not the final output.

outpaddling commented 3 years ago

Just got a hang here: ... [===============================================================> ] 98.28% 35.40[===============================================================> ] 98.28% 35.41[================================================================>] 99.28% 35.77[=================================================================] 100.00% 36.03K 3s 736ms Time for merging to aln_swapped: 0h 0m 0s 23ms 103611 alignments calculated 72095 sequence pairs passed the thresholds (0.695824 of overall calculated) 2.001194 hits per query sequence Time for processing: 0h 0m 3s 787ms swapresults /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/targetannotation_profile /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/query /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/tmp/7458066464536510288/aln_swapped /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/results_aln --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 10000 --split-memory-limit 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 16 --compressed 0 --db-load-mode 0 -v 3

outpaddling commented 3 years ago

Probably not important, but I noticed a mismatch between your cmake settings

set(CMAKE_XCODE_ATTRIBUTE_CLANG_CXX_LANGUAGE_STANDARD "c++11")

and the actual compiler options

-fsigned-char -D_WITH_GETLINE -std=c++1y -pedantic

outpaddling commented 3 years ago

I replaced -msse2 with -march=x86_64, so clang will bundle SSE with other common options for low-end AMD64 CPUs. Didn't change the results, though. Still hangs.

milot-mirdita commented 3 years ago

I tried to reproduce the problem in my FreeBSD 13 VM with your wip-ports repository and I can't get it to hang. I tried with both -msse2 and -march=x86_64 (and removed the USE_GCC line). Maybe the issue is that it's swapping at that moment a bit excessively and it would eventually continue? Could you attach gdb/lldb at the moment it's hanging and produce a stack trace? That's quite the odd issue that I've not encountered on any other systems :/

I think the c++ standard was somewhat of a conscious choice, as we don't really want to use modern C++, but (iirc) gcc 4.8 would complain about one of the dependencies without increasing the c++ standard slightly.

outpaddling commented 3 years ago

First, thanks for your above-and-beyond efforts to diagnose this.

What were your compile flags? How many cores and how much RAM does your VM have?

Adding output below from builds with GCC disabled and WITH_DEBUG=yes (adds -g and prevents stripping binaries).

outpaddling commented 3 years ago

From Dell PowerEdge:

ps axw:

  579  0  I+         0:00.01 /bin/sh -e ./run_regression.sh /usr/local/bin/mmseqs ./Temp
 1206  0  I+         0:00.00 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/regression/run_nucl
 1217  0  I+         0:00.02 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/NUCLNUCL_TRANS
 1225  0  I+         0:07.71 /usr/local/bin/mmseqs offsetalignment /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1

lldb:

(lldb) process attach --pid 1225
Process 1225 stopped

Executable module set to "/usr/local/bin/mmseqs".
Architecture set to: x86_64--freebsd12.2.
(lldb) bt
* thread #1, name = 'mmseqs'
  * frame #0: 0x000000080086f68c libthr.so.3`___lldb_unnamed_symbol190$$libthr.so.3 + 92
    frame #1: 0x000000080086ccab libthr.so.3`___lldb_unnamed_symbol159$$libthr.so.3 + 491
    frame #2: 0x000000080092ea3e libomp.so`___lldb_unnamed_symbol30$$libomp.so + 302
    frame #3: 0x000000080096faaa libomp.so`___lldb_unnamed_symbol400$$libomp.so + 698
    frame #4: 0x000000080096dd5c libomp.so`___lldb_unnamed_symbol392$$libomp.so + 604
    frame #5: 0x000000080096aca7 libomp.so`___lldb_unnamed_symbol384$$libomp.so + 1095
    frame #6: 0x0000000800966434 libomp.so`__kmpc_barrier + 308
    frame #7: 0x0000000000408496 mmseqs`ips4o::OpenMPThreadPool::Sync::barrier(this=0x0000000802849038) const at thread_pool.hpp:63:1
    frame #8: 0x0000000000436525 mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, bucket_start=0x0000000802842000, shared=0x0000000802842000, my_id=0, num_threads=32) at partitioning.hpp:109:36
    frame #9: 0x0000000000435f9b mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, shared=0x0000000802842000, num_threads=32, task_sorter=0x00007fffffffbbb8) at parallel.hpp:114:26
    frame #10: 0x0000000000435e27 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffba98, my_id=0, num_threads=32)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
    frame #11: 0x00000000003edd3f mmseqs`::.omp_outlined._debug__.121(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) &) at thread_pool.hpp:95:13
    frame #12: 0x00000000003edd75 mmseqs`::.omp_outlined..122(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) at thread_pool.hpp:95:13
    frame #13: 0x0000000800984653 libomp.so`__kmp_invoke_microtask + 147
    frame #14: 0x0000000800963c82 libomp.so`___lldb_unnamed_symbol362$$libomp.so + 370
    frame #15: 0x000000080095f4af libomp.so`__kmp_fork_call + 7423
    frame #16: 0x0000000800965c96 libomp.so`__kmpc_fork_call + 310
    frame #17: 0x0000000000435d3d mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffbb88, func=0x00007fffffffba98, num_threads=32)<ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)>(ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
    frame #18: 0x000000000042cb21 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffbb88, begin=0x0000000802a7fe80, end=0x0000000802ba0f70)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*) at parallel.hpp:189:9
    frame #19: 0x000000000042c776 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbbf0, num_threads=32) at ips4o.hpp:128:9
    frame #20: 0x00000000003ee74f mmseqs`void ips4o::parallel::sort<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbc40) at ips4o.hpp:137:5
    frame #21: 0x00000000003eaab7 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffd2b0, isSortedById=true) at DBReader.cpp:367:9
    frame #22: 0x00000000003efebe mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffd2b0, accessType=2) at DBReader.cpp:185:9
    frame #23: 0x0000000000639482 mmseqs`offsetalignment(argc=20, argv=0x00007fffffffd8a8, command=0x0000000800f5a220) at offsetalignment.cpp:261:12
    frame #24: 0x000000000038731f mmseqs`runCommand(p=0x0000000800f5a220, argc=20, argv=0x00007fffffffd8a8) at Application.cpp:38:18
    frame #25: 0x0000000000388596 mmseqs`main(argc=22, argv=0x00007fffffffd898) at Application.cpp:196:9
    frame #26: 0x0000000000386400 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
(lldb) 
* thread #1, name = 'mmseqs'
  * frame #0: 0x000000080086f68c libthr.so.3`___lldb_unnamed_symbol190$$libthr.so.3 + 92
    frame #1: 0x000000080086ccab libthr.so.3`___lldb_unnamed_symbol159$$libthr.so.3 + 491
    frame #2: 0x000000080092ea3e libomp.so`___lldb_unnamed_symbol30$$libomp.so + 302
    frame #3: 0x000000080096faaa libomp.so`___lldb_unnamed_symbol400$$libomp.so + 698
    frame #4: 0x000000080096dd5c libomp.so`___lldb_unnamed_symbol392$$libomp.so + 604
    frame #5: 0x000000080096aca7 libomp.so`___lldb_unnamed_symbol384$$libomp.so + 1095
    frame #6: 0x0000000800966434 libomp.so`__kmpc_barrier + 308
    frame #7: 0x0000000000408496 mmseqs`ips4o::OpenMPThreadPool::Sync::barrier(this=0x0000000802849038) const at thread_pool.hpp:63:1
    frame #8: 0x0000000000436525 mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, bucket_start=0x0000000802842000, shared=0x0000000802842000, my_id=0, num_threads=32) at partitioning.hpp:109:36
    frame #9: 0x0000000000435f9b mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, shared=0x0000000802842000, num_threads=32, task_sorter=0x00007fffffffbbb8) at parallel.hpp:114:26
    frame #10: 0x0000000000435e27 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffba98, my_id=0, num_threads=32)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
    frame #11: 0x00000000003edd3f mmseqs`::.omp_outlined._debug__.121(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) &) at thread_pool.hpp:95:13
    frame #12: 0x00000000003edd75 mmseqs`::.omp_outlined..122(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) at thread_pool.hpp:95:13
    frame #13: 0x0000000800984653 libomp.so`__kmp_invoke_microtask + 147
    frame #14: 0x0000000800963c82 libomp.so`___lldb_unnamed_symbol362$$libomp.so + 370
    frame #15: 0x000000080095f4af libomp.so`__kmp_fork_call + 7423
    frame #16: 0x0000000800965c96 libomp.so`__kmpc_fork_call + 310
    frame #17: 0x0000000000435d3d mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffbb88, func=0x00007fffffffba98, num_threads=32)<ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)>(ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
    frame #18: 0x000000000042cb21 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffbb88, begin=0x0000000802a7fe80, end=0x0000000802ba0f70)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*) at parallel.hpp:189:9
    frame #19: 0x000000000042c776 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbbf0, num_threads=32) at ips4o.hpp:128:9
    frame #20: 0x00000000003ee74f mmseqs`void ips4o::parallel::sort<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbc40) at ips4o.hpp:137:5
    frame #21: 0x00000000003eaab7 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffd2b0, isSortedById=true) at DBReader.cpp:367:9
    frame #22: 0x00000000003efebe mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffd2b0, accessType=2) at DBReader.cpp:185:9
    frame #23: 0x0000000000639482 mmseqs`offsetalignment(argc=20, argv=0x00007fffffffd8a8, command=0x0000000800f5a220) at offsetalignment.cpp:261:12
    frame #24: 0x000000000038731f mmseqs`runCommand(p=0x0000000800f5a220, argc=20, argv=0x00007fffffffd8a8) at Application.cpp:38:18
    frame #25: 0x0000000000388596 mmseqs`main(argc=22, argv=0x00007fffffffd898) at Application.cpp:196:9
    frame #26: 0x0000000000386400 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
outpaddling commented 3 years ago

ThinkPad:

ps axw:

54752  0  I+       0:00.01 /bin/sh -e ./run_regression.sh /usr/local/bin/mmseqs ./Temp
57131  0  I+       0:00.00 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/regression/run_easy_c
57133  0  I+       0:00.01 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/CLUSTER_REASSIGN
57135  0  I+       0:00.01 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/CLUSTER_REASSIGN
57200  0  S+       0:00.62 /usr/local/bin/mmseqs swapdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Tem

lldb:

(lldb) process attach --pid 57200
Process 57200 stopped

Executable module set to "/usr/local/bin/mmseqs".
Architecture set to: x86_64--freebsd13.0.
(lldb) bt
* thread #1, name = 'mmseqs'
  * frame #0: 0x0000000800bea528 libc.so.7`__sys__umtx_op + 8
    frame #1: 0x0000000000803044 mmseqs`__atomic_fetch_sub_16 [inlined] lock(l=0x000000000080c2e0) at atomic.c:72:5
    frame #2: 0x000000000080301e mmseqs`__atomic_fetch_sub_16(ptr=0x00000008013723b0, val=1180591620717411303424, model=<unavailable>) at atomic.c:342
    frame #3: 0x0000000000449273 mmseqs`std::__1::pair<long, long> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00000008013723b0)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::BucketPointers::decRead<true>() at bucket_pointers.hpp:106:28
    frame #4: 0x0000000000449482 mmseqs`int ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::classifyAndReadBlock<false, true>(this=0x00007fffffffa8a8, read_bucket=29) at block_permutation.hpp:69:41
    frame #5: 0x0000000000448403 mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::permuteBlocks<false, true>(this=0x00007fffffffa8a8) at block_permutation.hpp:137:31
    frame #6: 0x0000000000447b4d mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, bucket_start=0x0000000801371000, shared=0x0000000801371000, my_id=0, num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*, long*, ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData*, int, int) at partitioning.hpp:104:9
    frame #7: 0x00000000004475fb mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, shared=0x0000000801371000, num_threads=4, task_sorter=0x00007fffffffaed8)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData&, int, ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&) at parallel.hpp:114:26
    frame #8: 0x0000000000447487 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffadd8, my_id=0, num_threads=4)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
    frame #9: 0x00000000003ef65f mmseqs`::.omp_outlined._debug__.54(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) &) at thread_pool.hpp:95:13
    frame #10: 0x00000000003ef695 mmseqs`::.omp_outlined..55(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) at thread_pool.hpp:94:1
    frame #11: 0x000000080098d523 libomp.so`__kmp_invoke_microtask + 147
    frame #12: 0x0000000800968332 libomp.so`___lldb_unnamed_symbol498$$libomp.so + 370
    frame #13: 0x0000000800963b3f libomp.so`__kmp_fork_call + 7551
    frame #14: 0x000000080093cfb6 libomp.so`__kmpc_fork_call + 310
    frame #15: 0x00000000004473a1 mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffaea8, func=0x00007fffffffadd8, num_threads=4)<ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)>(ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
    frame #16: 0x000000000043e399 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffaea8, begin=0x00000008012ffc40, end=0x00000008013647d0)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*) at parallel.hpp:189:9
    frame #17: 0x000000000043dfd2 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84), num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), int) at ips4o.hpp:128:9
    frame #18: 0x000000000040d66a mmseqs`void ips4o::parallel::sort<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84))(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)) at ips4o.hpp:137:5
    frame #19: 0x00000000003ef124 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffc718, isSortedById=true) at DBReader.cpp:403:9
    frame #20: 0x000000000044adae mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffc718, accessType=8) at DBReader.cpp:185:9
    frame #21: 0x000000000059f427 mmseqs`doswap(par=0x000000080121f1c0, isGeneralMode=true) at swapresults.cpp:49:22
    frame #22: 0x00000000005a1f2e mmseqs`swapdb(argc=8, argv=0x00007fffffffd2c8, command=0x0000000801269b00) at swapresults.cpp:353:12
    frame #23: 0x000000000038a19f mmseqs`runCommand(p=0x0000000801269b00, argc=8, argv=0x00007fffffffd2c8) at Application.cpp:38:18
    frame #24: 0x000000000038b416 mmseqs`main(argc=10, argv=0x00007fffffffd2b8) at Application.cpp:196:9
    frame #25: 0x0000000000389280 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
(lldb) 
* thread #1, name = 'mmseqs'
  * frame #0: 0x0000000800bea528 libc.so.7`__sys__umtx_op + 8
    frame #1: 0x0000000000803044 mmseqs`__atomic_fetch_sub_16 [inlined] lock(l=0x000000000080c2e0) at atomic.c:72:5
    frame #2: 0x000000000080301e mmseqs`__atomic_fetch_sub_16(ptr=0x00000008013723b0, val=1180591620717411303424, model=<unavailable>) at atomic.c:342
    frame #3: 0x0000000000449273 mmseqs`std::__1::pair<long, long> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00000008013723b0)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::BucketPointers::decRead<true>() at bucket_pointers.hpp:106:28
    frame #4: 0x0000000000449482 mmseqs`int ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::classifyAndReadBlock<false, true>(this=0x00007fffffffa8a8, read_bucket=29) at block_permutation.hpp:69:41
    frame #5: 0x0000000000448403 mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::permuteBlocks<false, true>(this=0x00007fffffffa8a8) at block_permutation.hpp:137:31
    frame #6: 0x0000000000447b4d mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, bucket_start=0x0000000801371000, shared=0x0000000801371000, my_id=0, num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*, long*, ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData*, int, int) at partitioning.hpp:104:9
    frame #7: 0x00000000004475fb mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, shared=0x0000000801371000, num_threads=4, task_sorter=0x00007fffffffaed8)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData&, int, ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&) at parallel.hpp:114:26
    frame #8: 0x0000000000447487 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffadd8, my_id=0, num_threads=4)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
    frame #9: 0x00000000003ef65f mmseqs`::.omp_outlined._debug__.54(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) &) at thread_pool.hpp:95:13
    frame #10: 0x00000000003ef695 mmseqs`::.omp_outlined..55(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) at thread_pool.hpp:94:1
    frame #11: 0x000000080098d523 libomp.so`__kmp_invoke_microtask + 147
    frame #12: 0x0000000800968332 libomp.so`___lldb_unnamed_symbol498$$libomp.so + 370
    frame #13: 0x0000000800963b3f libomp.so`__kmp_fork_call + 7551
    frame #14: 0x000000080093cfb6 libomp.so`__kmpc_fork_call + 310
    frame #15: 0x00000000004473a1 mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffaea8, func=0x00007fffffffadd8, num_threads=4)<ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)>(ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
    frame #16: 0x000000000043e399 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffaea8, begin=0x00000008012ffc40, end=0x00000008013647d0)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*) at parallel.hpp:189:9
    frame #17: 0x000000000043dfd2 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84), num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), int) at ips4o.hpp:128:9
    frame #18: 0x000000000040d66a mmseqs`void ips4o::parallel::sort<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84))(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)) at ips4o.hpp:137:5
    frame #19: 0x00000000003ef124 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffc718, isSortedById=true) at DBReader.cpp:403:9
    frame #20: 0x000000000044adae mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffc718, accessType=8) at DBReader.cpp:185:9
    frame #21: 0x000000000059f427 mmseqs`doswap(par=0x000000080121f1c0, isGeneralMode=true) at swapresults.cpp:49:22
    frame #22: 0x00000000005a1f2e mmseqs`swapdb(argc=8, argv=0x00007fffffffd2c8, command=0x0000000801269b00) at swapresults.cpp:353:12
    frame #23: 0x000000000038a19f mmseqs`runCommand(p=0x0000000801269b00, argc=8, argv=0x00007fffffffd2c8) at Application.cpp:38:18
    frame #24: 0x000000000038b416 mmseqs`main(argc=10, argv=0x00007fffffffd2b8) at Application.cpp:196:9
    frame #25: 0x0000000000389280 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
milot-mirdita commented 3 years ago

Ah that's interesting. In the preset flags we have this:

elseif (HAVE_SSE2)
    set(MMSEQS_ARCH "${MMSEQS_ARCH} -msse2")
    set(DISABLE_IPS4O 1)

It seems I had a reason for the the DISABLE_IPS4O here, beyond reducing requirements. This disables this fast sorting library and falls back to a different slightly slower one. You should pass -DDISABLE_IPS4O=1 to cmake.

IPS4o requires either 16 byte compare exchange instructions (enabled by -mcx16) or slower slower implementation from libatomic. For lowest common denominator compilation it would be a good idea to disable anyway.

outpaddling commented 3 years ago

That seems to have done it. Nice work!

I'm still not clear on why it was working on your FreeBSD VM or why it works with GCC. From what I can tell, CMPXCHG16B was only lacking on VERY early AMD64 architectures. My hardware is old, but not that old.

milot-mirdita commented 3 years ago

I am not sure why. This sorting library is also a bit fickle on uncommon architectures (Power and Z, though MMseqs2 doesn't 100% work on Z yet anyway) and I've explicitly disabled it on those.

outpaddling commented 3 years ago

Can you post the output of ldd /usr/local/bin/mmseqs and make clean build in wip/mmseqs2? I wonder if my build is picking up some optional dependency that yours is not. I'm guessing you don't have many packages installed on the VM. Thanks...

milot-mirdita commented 3 years ago
# ldd /usr/local/bin/mmseqs
/usr/local/bin/mmseqs:
    libthr.so.3 => /lib/libthr.so.3 (0x80066c000)
    libm.so.5 => /lib/libm.so.5 (0x800699000)
    libz.so.6 => /lib/libz.so.6 (0x8006cc000)
    libbz2.so.4 => /usr/lib/libbz2.so.4 (0x8006e8000)
    libomp.so => /usr/lib/libomp.so (0x8006fe000)
    libc++.so.1 => /usr/lib/libc++.so.1 (0x8007c5000)
    libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x800897000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x8008ba000)
    libc.so.7 => /lib/libc.so.7 (0x8008d3000)

And zstd was picked up during cmake:

-- Found ZSTD: /usr/local/lib/libzstd.a

This looks pretty complete. I don't remember anything else that we might be missing.

outpaddling commented 3 years ago

The GCC10 build picks up libatomic, which may at least explain the GCC vs clang difference.

# ldd /usr/local/bin/mmseqs 
/usr/local/bin/mmseqs:
    libthr.so.3 => /lib/libthr.so.3 (0x800ace000)
    libatomic.so.1 => /usr/local/lib/gcc10/libatomic.so.1 (0x800afc000)
    libz.so.6 => /lib/libz.so.6 (0x800d03000)
    libbz2.so.4 => /usr/lib/libbz2.so.4 (0x800d1f000)
    libstdc++.so.6 => /usr/local/lib/gcc10/libstdc++.so.6 (0x800d35000)
    libm.so.5 => /lib/libm.so.5 (0x80111b000)
    libgomp.so.1 => /usr/local/lib/gcc10/libgomp.so.1 (0x80114e000)
    libgcc_s.so.1 => /usr/local/lib/gcc10/libgcc_s.so.1 (0x80138b000)
    libc.so.7 => /lib/libc.so.7 (0x8015a3000)
    libdl.so.1 => /usr/lib/libdl.so.1 (0x8019b4000)

I think we're set now. Thanks again!