Closed outpaddling closed 3 years ago
Thanks a lot for your work!
I looked though this repository and found these things that might need to be slightly tweaked. In https://github.com/outpaddling/freebsd-ports-wip/blob/master/mmseqs2/Makefile:
I am not sure what to think of the arch patch, if you don't set any of the -DHAVE_*
parameters, they are not used anyway and the automatic detection can be disabled by setting -DMMSEQS_ARCH=" "
or something like that. I would suggest to drop that patch.
Does FreeBSD not have any baseline requirements (i.e. Debian has SSE2 as baseline)? I would be happy if at very least SSE2 would be enabled by default on x86_64.
-march
also doesn't work very well on some non x86
architectures, some require -mcpu
to work correctly.
Are 32-bit builds disabled? MMseqs2 currently produces incorrect results on 32-bit systems (see #418, we will probably eventually deal with this to support webassembly fully).
Would it be possible to run the small subset of test pipeline, that is part of the release on Github (i.e.: https://github.com/soedinglab/MMseqs2/releases/download/13-45111/MMseqs2-Regression.zip)? This would ensure that MMseqs2 on FreeBSD produces correct results. I was looking for a free CI service that supports *BSD previously, but couldn't find any.
Thanks for the quick and detailed feedback! Partial answer: 1) awk, zlib, bzip2, and omp are included in the FreeBSD base, so no package dependency needed. 2) Generally, FreeBSD ports respect the user's env regarding build options and of course the binary package has to be pessimistic about hardware. I'll check on the baseline assumptions and what clang -O2 emits, though. I was also thinking of adding a package message suggesting that it be built from source with more agressive optimizations to get better performance. That's trivial to do with FreeBSD ports. I wanted to sneak the commit in before the quarterly branch coming next week so it's at least available in the next quarterly package set, I haven't put much effort into perfecting it yet. With your feedback I should be able to make some good improvements by then. 3) Do you actually plan to continue support for 32-bit platforms? It's disabled for many bioinformatics ports already. I'll look into the rest of your comments ASAP. BTW, this is the first time in my lengthy career I've ported a C++/cmake project to any platform and got a build with zero warnings on the first try. Somebody on your end is doing some good work. ;-)
Thanks for the kind words :) It helps that macOS is quite close to BSD and I also tried compiling on BSD a while ago (9e103bc79494e973cdcba47dd0099b9b9a565d66) and it luckily didn't break in the meantime again.
I see that CirrusCI seems to have BSD support, so we might start using that soonish.
Regarding 32bit: I don't really want to support it, but if I want webassembly support, then 32bit has to work. However, right now its subtly broken and should not be used.
Without the CMakeLists patch, I get the following on aarch64: cc: error: the clang compiler does not support '-march=native' At any rate, when restricting an upstream build system, it's generally better to use a blunt-force patch. A more finessed approach like -DMMSEQS_ARCH can be fragile. Suppose you decide to change the name of the variable between now and the next release. This patch is then silently rendered inert. If I overlook the change, which is easy to do, the builds introduce agressive optimizations into the binary package, causing illegal instruction dumps for users with lower-end hardware than the build cluster. The static patch, on the otherhand, will break on a variable name chnage so I'll be alerted that it needs attention.
For wget and curl, are you referring to createtaxdb.sh and databases.sh? Are these supposed to be installed? The cmake build system only installs mmseqs and bash-conmpletion.sh. Rather than add another dependency, I would add FreeBSD's native fetch command as the final fallback option as follows:
--- data/workflow/databases.sh.orig 2021-06-25 01:34:08 UTC
+++ data/workflow/databases.sh
@@ -27,6 +27,8 @@ STRATEGY=""
if hasCommand aria2c; then STRATEGY="$STRATEGY ARIA"; fi
if hasCommand curl; then STRATEGY="$STRATEGY CURL"; fi
if hasCommand wget; then STRATEGY="$STRATEGY WGET"; fi
+# Part of FreeBSD base, need not be installed separately
+if hasCommand fetch; then STRATEGY="$STRATEGY FETCH"; fi
if [ "$STRATEGY" = "" ]; then
fail "No download tool found in PATH. Please install aria2c, curl or wget."
fi
@@ -47,6 +49,9 @@ downloadFile() {
;;
WGET)
wget -O "$OUTPUT" "$URL" && return 0
+ ;;
+ FETCH)
+ fetch -o "$OUTPUT" "$URL" && return 0
;;
esac
done
Most bioinformaticians will have curl and/or wget installed anyway (both are included in the biostar-tools metapoirt), so it won't come into play, but we try to minimize unnecessary requirements.
I haven't been able to find a specify CPU feature requirement for FreeBSD, but I think SSE2 is safe to assume for amd64, but I also added a pkg-message suggesting an optimized build from source. How much performance gain do you typically see from SSE4 or AVX?
Thanks...
You are right that it might be a bit fragile.
Exactly, these two workflows need to have something to download files. The workflows are automatically compiled into the binary and executed when the respective workflow is called (that's what either the xxd or perl build time dependency is for). Fetch sounds good, I'll try that out when I get a CirrusCI with FreeBSD going.
AVX2 is a bit IIRC ~30% faster than SSE4.1, so it's not super important. The only problem is if no SIMD flags are specified at all. Then we fall back to the scalar intrinsic implementations of SIMDe which are a lot slower (don't have an exact number, but it was a few factors slower).
Got it. So what files should be present in a minimal installation? For now, I'm installing everything in data in addition to the two files your install target does. I haven't had time to play with mmseqs2 yet and probably won't for a while, but I want to the FreeBSD port to work out-of-the-box for a typical user.
You don't really need the data directory. The .out
files for the substitution matrices are the only somewhat useful files there. cmake
deals with the shell scripts in data/{workflow,resources}
, they are only needed during build time, not runtime.
So the workflow scripts don't need to be accessible at runtime? I wasn't sure what you meant by "compiled in".
Thanks...
Exactly, they are embedded into the binary and written to disk and executed when needed. We tried to take care that the scripts don't have any BASHisms and only call standard POSIX tools, awk and one of wget/curl/aria2c.
When built with just sse2 or sse3, I'm seeing hangs. No cpu, disk, or network activity. Deadlock?
Built with -march=native, it seems to mostly work:
SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353
EASY_SEARCH (Time: 21s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353
EASY_SEARCH_INDEX_SPLIT (Time: 23s) TEST SUCCESS GOOD Expected: 0.118265 Actual: 0.118265
PROFILE (Time: 50s) TEST SUCCESS GOOD Expected: 0.367396 Actual: 0.367396
EASY_PROFILE (Time: 37s) TEST SUCCESS GOOD Expected: 0.33876 Actual: 0.338768
SLICEPROFILE (Time: 22s) TEST FAILED (NO REPORT)
DBPROFILE (Time: 18s) TEST SUCCESS GOOD Expected: 0.182017 Actual: 0.182017
EXPAND (Time: 50s) TEST SUCCESS GOOD Expected: 0.16614, 0.171723, 0.222982 Actual: 0.16614, 0.224627, 0.222982
NUCLPROT_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: 0.238076 Actual: 0.238076
NUCLNUCL_SEARCH (Time: 21s) TEST FAILED (NO REPORT)
NUCLNUCL_TRANS_SEARCH (Time: 14s) TEST FAILED (NO REPORT)
CLUSTER (Time: 14s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722
EASY_CLUSTER (Time: 14s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722
EASY_NUCL_CLUSTER (Time: 3s) TEST SUCCESS GOOD Expected: 106 Actual: 106
CLUSTER_REASSIGN (Time: 12s) TEST SUCCESS GOOD Expected: 17403 Actual: 17403
LINCLUST (Time: 4s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491
LINCLUST_SPLIT (Time: 8s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491
EASY_LINCLUST (Time: 4s) TEST SUCCESS GOOD Expected: 26493 Actual: 26493
CLUSTHASH (Time: 0s) TEST SUCCESS GOOD Expected: 5 Actual Prot: 5 Nucl: 5
PROTNUCL_SEARCH (Time: 27s) TEST SUCCESS GOOD Expected: 0.237504 Actual: 0.237504
NUCLPROTTAX_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265 Actual: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265
EASYNUCLPROTSEARCH_TAX (Time: 26s) TEST SUCCESS GOOD Expected: from filtertaxdb: 3626 680 1425; Actual: from filtertaxdb: 3626 680 1425;
DBPROFILE_INDEX (Time: 50s) TEST SUCCESS GOOD Expected: 0.197552 Actual: 0.197552
LINSEARCH_NUCLNUCL_TARNS_SEARCH (Time: 15s) TEST FAILED (NO REPORT)
LINSEARCH_NUCLNUCL_SEARCH (Time: 20s) TEST FAILED (NO REPORT)
EASY_LINSEARCH_NUCLNUCL_SEARCH_SPLIT (Time: 29s) TEST SUCCESS GOOD Expected: 0.108903 Actual: 0.108903
LINCLUST_UPDATE (Time: 19s) TEST SUCCESS GOOD Expected: 32132 24732 32132 Actual: 32132 24732 32132
EASYNUCLNUCLTAX_SEARCH (Time: 55s) TEST SUCCESS GOOD Expected: from taxonomyreport: 2607 243 2624 Actual: from taxonomyreport: 2607 243 2624
EXTRACTORFS (Time: 0s) TEST SUCCESS GOOD Expected: 0 Actual: 0
RBH (Time: 5s) TEST SUCCESS GOOD Expected: 10 Actual: 10
APPLY (Time: 1s) TEST SUCCESS GOOD Expected: 2570583 Actual: 2570583
INDEX_COMPATIBLE (Time: 7s) TEST SUCCESS GOOD Expected: 0 Actual: 0
FILTERDB (Time: 1s) TEST SUCCESS GOOD Expected: 0 Actual: 0
PREF_DB_LOAD_MODE (Time: 18s) TEST SUCCESS GOOD Expected: 0.0856974 Actual: 0.0856974
FILTERTAXSEQDB (Time: 1s) TEST SUCCESS GOOD Expected: 0,1,2 0,1,2,3,4,5 3,4,5 Actual: 0,1,2 0,1,2,3,4,5 3,4,5
NOMPI_TARGET_SPLIT (Time: 8s) TEST SUCCESS GOOD Expected: 500 Actual: 500
NOMPI_SLICE_TECH (Time: 13s) TEST SUCCESS GOOD Expected: 512,256,128,64,32 Actual: 512,256,128,64,32
In what little output didn't scroll away, I see a couple of messages like this:
posix_madvise returned an error /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/NOMPI_SLICE_TECH/DSL_17K_SPLIT_MODE_1/tmpFolder/4754201167969432722/pref
Ah the nucl-nucl test are very likely to fail as they include statically compiled binaries for samtools (for linux, windows and macos). I would need to add a statically compiled binary for freebsd too. SLICEPROFILE
is odd however.
The posix_madvice
error is annoying but can be ignored. I don't see anything special in the freebsd man page that would indicate that this might fail/behave differently.
I am downloading a VM and I'll take a look at these issues.
Can it use an external samtools?
And FYI, SLICEPROFILE succeeded on my 4-core Phenom workstation. The failure occurred on a 16C/32T PowerEdge server. Full output from another run with |& tee log
attached.
Replacing the util/regression/samtools/samtools.sh
with the following works:
#!/bin/sh -e
SELF="$( cd "$(dirname "$0")" ; pwd -P )"
SUFFIX=""
case "$(uname -m)" in
arm*|aarch*) SUFFIX="-aarch64" ;;
ppc*) SUFFIX="-ppc64le" ;;
esac
case "$(echo "$OSTYPE" | tr '[:upper:]' '[:lower:]')" in
linux*) exec "$SELF/samtools-linux$SUFFIX" "$@" ;;
darwin*) exec "$SELF/samtools-darwin" "$@" ;;
msys*|cygwin*) exec "$SELF/samtools-windows" "$@" ;;
esac
samtools "$@"
I don't want to push that commit that right now since we are in the process of some bigger refactoring.
SLICEPROFILE
is a bit fickle about RAM available per core used. That seems to have been the problem:
[=======mem_align could not allocate memory.
I also fixed the bogus error messages in https://github.com/soedinglab/MMseqs2/commit/15ace29a276be54fee6b9aedd7a1e814a3c7769b
If I compile with GCC 10 (just make USE_GCC=yes in FreeBSD ports), it does not hang with only SSE. I'd hate to add such a heavy dependency to the port, though, so it would be good to figure out what's going wrong with clang 10.
The PowerEdge has 32 hyperthreads and 64G RAM. Is 2G/thread not enough for SLICEPROFILE? If not, how can I limit the number of threads in the regression tests? It doesn't seem to respect OMP_NUM_THREADS=16. I still see CPU spike to > 3000%.
I'll test your patches later.
Thanks...
With your patches and building with GCC + SSE3 only, all tests pass except for SLICEPROFILE:
SEARCH (Time: 16s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353
EASY_SEARCH (Time: 20s) TEST SUCCESS GOOD Expected: 0.238353 Actual: 0.238353
EASY_SEARCH_INDEX_SPLIT (Time: 21s) TEST SUCCESS GOOD Expected: 0.118265 Actual: 0.118265
PROFILE (Time: 52s) TEST SUCCESS GOOD Expected: 0.367396 Actual: 0.367396
EASY_PROFILE (Time: 42s) TEST SUCCESS GOOD Expected: 0.33876 Actual: 0.338768
SLICEPROFILE (Time: 22s) TEST FAILED (NO REPORT)
DBPROFILE (Time: 17s) TEST SUCCESS GOOD Expected: 0.182017 Actual: 0.182017
EXPAND (Time: 52s) TEST SUCCESS GOOD Expected: 0.16614, 0.171723, 0.222982 Actual: 0.16614, 0.224627, 0.222982
NUCLPROT_SEARCH (Time: 12s) TEST SUCCESS GOOD Expected: 0.238076 Actual: 0.238076
NUCLNUCL_SEARCH (Time: 31s) TEST SUCCESS GOOD Expected: 0.192043 Actual: 0.192043
NUCLNUCL_TRANS_SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.11646 Actual: 0.11646
CLUSTER (Time: 17s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722
EASY_CLUSTER (Time: 19s) TEST SUCCESS GOOD Expected: 15722 Actual: 15722
EASY_NUCL_CLUSTER (Time: 3s) TEST SUCCESS GOOD Expected: 106 Actual: 106
CLUSTER_REASSIGN (Time: 14s) TEST SUCCESS GOOD Expected: 17403 Actual: 17403
LINCLUST (Time: 7s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491
LINCLUST_SPLIT (Time: 10s) TEST SUCCESS GOOD Expected: 26491 Actual: 26491
EASY_LINCLUST (Time: 12s) TEST SUCCESS GOOD Expected: 26493 Actual: 26493
CLUSTHASH (Time: 0s) TEST SUCCESS GOOD Expected: 5 Actual Prot: 5 Nucl: 5
PROTNUCL_SEARCH (Time: 28s) TEST SUCCESS GOOD Expected: 0.237504 Actual: 0.237504
NUCLPROTTAX_SEARCH (Time: 13s) TEST SUCCESS GOOD Expected: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265 Actual: from filtertaxdb: 1023 181 1265; from taxonomyreport: 1023 181 1265
EASYNUCLPROTSEARCH_TAX (Time: 25s) TEST SUCCESS GOOD Expected: from filtertaxdb: 3626 680 1425; Actual: from filtertaxdb: 3626 680 1425;
DBPROFILE_INDEX (Time: 50s) TEST SUCCESS GOOD Expected: 0.197552 Actual: 0.197552
LINSEARCH_NUCLNUCL_TARNS_SEARCH (Time: 18s) TEST SUCCESS GOOD Expected: 0.0620599 Actual: 0.0620599
LINSEARCH_NUCLNUCL_SEARCH (Time: 35s) TEST SUCCESS GOOD Expected: 0.108522 Actual: 0.108522
EASY_LINSEARCH_NUCLNUCL_SEARCH_SPLIT (Time: 49s) TEST SUCCESS GOOD Expected: 0.108903 Actual: 0.108903
LINCLUST_UPDATE (Time: 19s) TEST SUCCESS GOOD Expected: 32132 24732 32132 Actual: 32132 24732 32132
EASYNUCLNUCLTAX_SEARCH (Time: 99s) TEST SUCCESS GOOD Expected: from taxonomyreport: 2607 243 2624 Actual: from taxonomyreport: 2607 243 2624
EXTRACTORFS (Time: 0s) TEST SUCCESS GOOD Expected: 0 Actual: 0
RBH (Time: 4s) TEST SUCCESS GOOD Expected: 10 Actual: 10
APPLY (Time: 2s) TEST SUCCESS GOOD Expected: 2570583 Actual: 2570583
INDEX_COMPATIBLE (Time: 6s) TEST SUCCESS GOOD Expected: 0 Actual: 0
FILTERDB (Time: 1s) TEST SUCCESS GOOD Expected: 0 Actual: 0
PREF_DB_LOAD_MODE (Time: 18s) TEST SUCCESS GOOD Expected: 0.0856974 Actual: 0.0856974
FILTERTAXSEQDB (Time: 0s) TEST SUCCESS GOOD Expected: 0,1,2 0,1,2,3,4,5 3,4,5 Actual: 0,1,2 0,1,2,3,4,5 3,4,5
NOMPI_TARGET_SPLIT (Time: 9s) TEST SUCCESS GOOD Expected: 500 Actual: 500
NOMPI_SLICE_TECH (Time: 14s) TEST SUCCESS GOOD Expected: 512,256,128,64,32 Actual: 512,256,128,64,32
GCC should not be necessary. Clang works fully in all kinds of configurations.
We introduced our own env variable MMSEQS_NUM_THREADS
to limit threads globally if --threads
cannot be (conveniently) set. I limited my VM to 2GB and couldn't get it to crash in the same way during SLICEPROFILE
(the OOM killer always killed it, it didn't crash in posix_memalign).
How exactly did you compile it when it hang with SSE?
All tests pass on my PowerEdge 32-ht system with MMSEQS_NUM_THREADS=16.
Also, I had forgotten that this machine has vmem limits, so I was actually running with about 1G/thread before.
Exceprt from build:
[ 3% 10/228] /usr/bin/c++ -I/usr/local/include -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/tinyexpr -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/microtar -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/simde -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/simd -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/gzstream -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/alp -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/cacode -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/ksw2 -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/xxhash -I/usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/ips4o -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -fsigned-char -D_WITH_GETLINE -std=c++1y -stdlib=libc++ -MD -MT lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o -MF lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o.d -o lib/cacode/CMakeFiles/cacode.dir/lambda_calculator.cpp.o -c /usr/ports/wip/mmseqs2/work/MMseqs2-13-45111/lib/cacode/lambda_calculator.cpp
...
[100% 228/228] : && /usr/bin/c++ -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -O2 -pipe -msse2 -fstack-protector-strong -fno-strict-aliasing -lpthread -fstack-protector-strong -fsigned-char -D_WITH_GETLINE -std=c++1y -stdlib=libc++ -pedantic -Wall -Wextra -Wdisabled-optimization -fno-exceptions -fopenmp=libomp src/CMakeFiles/mmseqs.dir/mmseqs.cpp.o -o src/mmseqs src/libmmseqs-framework.a src/version/libversion.a lib/tinyexpr/libtinyexpr.a -lm /usr/local/lib/libzstd.a lib/microtar/libmicrotar.a -lz -lbz2 -lomp && :
The hangs are not totally consistent, but seem to usually happen here: ... Create directory /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp easy-search /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/results_aln.m8 /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp -e 10000 -s 4 --max-seqs 4000 --num-iterations 2 --compressed 1
MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 10000
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Threads 16
Compressed 1
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 4
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 4000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 2
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp/11635372687271654297/query --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 1 -v 3
Converting sequences [6364] 1s 293ms Time for merging to query_h: 0h 0m 0s 19ms Time for merging to query: 0h 0m 0s 23ms Database type: Aminoacid Time for processing: 0h 0m 1s 363ms createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/EASY_PROFILE/tmp/11635372687271654297/target --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 1 -v 3
Converting sequences [35957] 9s 707ms Time for merging to target_h: 0h 0m 0s 21ms Time for merging to target: 0h 0m 0s 33ms Database type: Aminoacid
Next most likely hang point is here: ... [ 50%] Building CXX object CMakeFiles/evaluate_results.dir/src/EvaluateResults.cpp.o [100%] Linking CXX executable evaluate_results [100%] Built target evaluate_results createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/query.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/SEARCH/query
MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3
Converting sequences [6364] 0s 83ms Time for merging to query_h: 0h 0m 0s 12ms Time for merging to query: 0h 0m 0s 16ms Database type: Aminoacid Time for processing: 0h 0m 0s 136ms createdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/data/targetannotation.fasta /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/SEARCH/targetannotation
MMseqs Version: 0aab0f129537ab954340eb44d8e99e4d69a1dfd3 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3
Converting sequences [35957] 0s 127ms Time for merging to targetannotation_h: 0h 0m 0s 20ms Time for merging to targetannotation: 0h 0m 0s 43ms Database type: Aminoacid
I have seen 1 or 2 hangs where "Database type: Aminoacid" was not the final output.
Just got a hang here: ... [===============================================================> ] 98.28% 35.40[===============================================================> ] 98.28% 35.41[================================================================>] 99.28% 35.77[=================================================================] 100.00% 36.03K 3s 736ms Time for merging to aln_swapped: 0h 0m 0s 23ms 103611 alignments calculated 72095 sequence pairs passed the thresholds (0.695824 of overall calculated) 2.001194 hits per query sequence Time for processing: 0h 0m 3s 787ms swapresults /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/targetannotation_profile /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/query /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/tmp/7458066464536510288/aln_swapped /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/DBPROFILE/results_aln --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 10000 --split-memory-limit 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 16 --compressed 0 --db-load-mode 0 -v 3
Probably not important, but I noticed a mismatch between your cmake settings
set(CMAKE_XCODE_ATTRIBUTE_CLANG_CXX_LANGUAGE_STANDARD "c++11")
and the actual compiler options
-fsigned-char -D_WITH_GETLINE -std=c++1y -pedantic
I replaced -msse2 with -march=x86_64, so clang will bundle SSE with other common options for low-end AMD64 CPUs. Didn't change the results, though. Still hangs.
I tried to reproduce the problem in my FreeBSD 13 VM with your wip-ports repository and I can't get it to hang. I tried with both -msse2
and -march=x86_64
(and removed the USE_GCC
line). Maybe the issue is that it's swapping at that moment a bit excessively and it would eventually continue? Could you attach gdb/lldb at the moment it's hanging and produce a stack trace? That's quite the odd issue that I've not encountered on any other systems :/
I think the c++ standard was somewhat of a conscious choice, as we don't really want to use modern C++, but (iirc) gcc 4.8 would complain about one of the dependencies without increasing the c++ standard slightly.
First, thanks for your above-and-beyond efforts to diagnose this.
What were your compile flags? How many cores and how much RAM does your VM have?
Adding output below from builds with GCC disabled and WITH_DEBUG=yes (adds -g and prevents stripping binaries).
From Dell PowerEdge:
ps axw:
579 0 I+ 0:00.01 /bin/sh -e ./run_regression.sh /usr/local/bin/mmseqs ./Temp
1206 0 I+ 0:00.00 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/regression/run_nucl
1217 0 I+ 0:00.02 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/NUCLNUCL_TRANS
1225 0 I+ 0:07.71 /usr/local/bin/mmseqs offsetalignment /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1
lldb:
(lldb) process attach --pid 1225
Process 1225 stopped
Executable module set to "/usr/local/bin/mmseqs".
Architecture set to: x86_64--freebsd12.2.
(lldb) bt
* thread #1, name = 'mmseqs'
* frame #0: 0x000000080086f68c libthr.so.3`___lldb_unnamed_symbol190$$libthr.so.3 + 92
frame #1: 0x000000080086ccab libthr.so.3`___lldb_unnamed_symbol159$$libthr.so.3 + 491
frame #2: 0x000000080092ea3e libomp.so`___lldb_unnamed_symbol30$$libomp.so + 302
frame #3: 0x000000080096faaa libomp.so`___lldb_unnamed_symbol400$$libomp.so + 698
frame #4: 0x000000080096dd5c libomp.so`___lldb_unnamed_symbol392$$libomp.so + 604
frame #5: 0x000000080096aca7 libomp.so`___lldb_unnamed_symbol384$$libomp.so + 1095
frame #6: 0x0000000800966434 libomp.so`__kmpc_barrier + 308
frame #7: 0x0000000000408496 mmseqs`ips4o::OpenMPThreadPool::Sync::barrier(this=0x0000000802849038) const at thread_pool.hpp:63:1
frame #8: 0x0000000000436525 mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, bucket_start=0x0000000802842000, shared=0x0000000802842000, my_id=0, num_threads=32) at partitioning.hpp:109:36
frame #9: 0x0000000000435f9b mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, shared=0x0000000802842000, num_threads=32, task_sorter=0x00007fffffffbbb8) at parallel.hpp:114:26
frame #10: 0x0000000000435e27 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffba98, my_id=0, num_threads=32)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
frame #11: 0x00000000003edd3f mmseqs`::.omp_outlined._debug__.121(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) &) at thread_pool.hpp:95:13
frame #12: 0x00000000003edd75 mmseqs`::.omp_outlined..122(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) at thread_pool.hpp:95:13
frame #13: 0x0000000800984653 libomp.so`__kmp_invoke_microtask + 147
frame #14: 0x0000000800963c82 libomp.so`___lldb_unnamed_symbol362$$libomp.so + 370
frame #15: 0x000000080095f4af libomp.so`__kmp_fork_call + 7423
frame #16: 0x0000000800965c96 libomp.so`__kmpc_fork_call + 310
frame #17: 0x0000000000435d3d mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffbb88, func=0x00007fffffffba98, num_threads=32)<ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)>(ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
frame #18: 0x000000000042cb21 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffbb88, begin=0x0000000802a7fe80, end=0x0000000802ba0f70)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*) at parallel.hpp:189:9
frame #19: 0x000000000042c776 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbbf0, num_threads=32) at ips4o.hpp:128:9
frame #20: 0x00000000003ee74f mmseqs`void ips4o::parallel::sort<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbc40) at ips4o.hpp:137:5
frame #21: 0x00000000003eaab7 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffd2b0, isSortedById=true) at DBReader.cpp:367:9
frame #22: 0x00000000003efebe mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffd2b0, accessType=2) at DBReader.cpp:185:9
frame #23: 0x0000000000639482 mmseqs`offsetalignment(argc=20, argv=0x00007fffffffd8a8, command=0x0000000800f5a220) at offsetalignment.cpp:261:12
frame #24: 0x000000000038731f mmseqs`runCommand(p=0x0000000800f5a220, argc=20, argv=0x00007fffffffd8a8) at Application.cpp:38:18
frame #25: 0x0000000000388596 mmseqs`main(argc=22, argv=0x00007fffffffd898) at Application.cpp:196:9
frame #26: 0x0000000000386400 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
(lldb)
* thread #1, name = 'mmseqs'
* frame #0: 0x000000080086f68c libthr.so.3`___lldb_unnamed_symbol190$$libthr.so.3 + 92
frame #1: 0x000000080086ccab libthr.so.3`___lldb_unnamed_symbol159$$libthr.so.3 + 491
frame #2: 0x000000080092ea3e libomp.so`___lldb_unnamed_symbol30$$libomp.so + 302
frame #3: 0x000000080096faaa libomp.so`___lldb_unnamed_symbol400$$libomp.so + 698
frame #4: 0x000000080096dd5c libomp.so`___lldb_unnamed_symbol392$$libomp.so + 604
frame #5: 0x000000080096aca7 libomp.so`___lldb_unnamed_symbol384$$libomp.so + 1095
frame #6: 0x0000000800966434 libomp.so`__kmpc_barrier + 308
frame #7: 0x0000000000408496 mmseqs`ips4o::OpenMPThreadPool::Sync::barrier(this=0x0000000802849038) const at thread_pool.hpp:63:1
frame #8: 0x0000000000436525 mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, bucket_start=0x0000000802842000, shared=0x0000000802842000, my_id=0, num_threads=32) at partitioning.hpp:109:36
frame #9: 0x0000000000435f9b mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffb5a8, begin=0x0000000802a7fe80, end=0x0000000802ba0f70, shared=0x0000000802842000, num_threads=32, task_sorter=0x00007fffffffbbb8) at parallel.hpp:114:26
frame #10: 0x0000000000435e27 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffba98, my_id=0, num_threads=32)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
frame #11: 0x00000000003edd3f mmseqs`::.omp_outlined._debug__.121(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) &) at thread_pool.hpp:95:13
frame #12: 0x00000000003edd75 mmseqs`::.omp_outlined..122(.global_tid.=0x00007fffffffb6a0, .bound_tid.=0x00007fffffffb698, func=0x00007fffffffba98) at thread_pool.hpp:95:13
frame #13: 0x0000000800984653 libomp.so`__kmp_invoke_microtask + 147
frame #14: 0x0000000800963c82 libomp.so`___lldb_unnamed_symbol362$$libomp.so + 370
frame #15: 0x000000080095f4af libomp.so`__kmp_fork_call + 7423
frame #16: 0x0000000800965c96 libomp.so`__kmpc_fork_call + 310
frame #17: 0x0000000000435d3d mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffbb88, func=0x00007fffffffba98, num_threads=32)<ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*)::'lambda'(int, int)>(ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
frame #18: 0x000000000042cb21 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffbb88, begin=0x0000000802a7fe80, end=0x0000000802ba0f70)(std::__1::pair<unsigned int, unsigned long>*, std::__1::pair<unsigned int, unsigned long>*) at parallel.hpp:189:9
frame #19: 0x000000000042c776 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbbf0, num_threads=32) at ips4o.hpp:128:9
frame #20: 0x00000000003ee74f mmseqs`void ips4o::parallel::sort<std::__1::pair<unsigned int, unsigned long>*, DBReader<unsigned int>::comparePairByOffset>(begin=0x0000000802a7fe80, end=0x0000000802ba0f70, comp=comparePairByOffset @ 0x00007fffffffbc40) at ips4o.hpp:137:5
frame #21: 0x00000000003eaab7 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffd2b0, isSortedById=true) at DBReader.cpp:367:9
frame #22: 0x00000000003efebe mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffd2b0, accessType=2) at DBReader.cpp:185:9
frame #23: 0x0000000000639482 mmseqs`offsetalignment(argc=20, argv=0x00007fffffffd8a8, command=0x0000000800f5a220) at offsetalignment.cpp:261:12
frame #24: 0x000000000038731f mmseqs`runCommand(p=0x0000000800f5a220, argc=20, argv=0x00007fffffffd8a8) at Application.cpp:38:18
frame #25: 0x0000000000388596 mmseqs`main(argc=22, argv=0x00007fffffffd898) at Application.cpp:196:9
frame #26: 0x0000000000386400 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
ThinkPad:
ps axw:
54752 0 I+ 0:00.01 /bin/sh -e ./run_regression.sh /usr/local/bin/mmseqs ./Temp
57131 0 I+ 0:00.00 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/regression/run_easy_c
57133 0 I+ 0:00.01 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/CLUSTER_REASSIGN
57135 0 I+ 0:00.01 /bin/sh -e /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Temp/CLUSTER_REASSIGN
57200 0 S+ 0:00.62 /usr/local/bin/mmseqs swapdb /home/bacon/MMseqs2-Regression-2b20d2ba3533b6fd5343b78398b4df4d1c2e8f87/Tem
lldb:
(lldb) process attach --pid 57200
Process 57200 stopped
Executable module set to "/usr/local/bin/mmseqs".
Architecture set to: x86_64--freebsd13.0.
(lldb) bt
* thread #1, name = 'mmseqs'
* frame #0: 0x0000000800bea528 libc.so.7`__sys__umtx_op + 8
frame #1: 0x0000000000803044 mmseqs`__atomic_fetch_sub_16 [inlined] lock(l=0x000000000080c2e0) at atomic.c:72:5
frame #2: 0x000000000080301e mmseqs`__atomic_fetch_sub_16(ptr=0x00000008013723b0, val=1180591620717411303424, model=<unavailable>) at atomic.c:342
frame #3: 0x0000000000449273 mmseqs`std::__1::pair<long, long> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00000008013723b0)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::BucketPointers::decRead<true>() at bucket_pointers.hpp:106:28
frame #4: 0x0000000000449482 mmseqs`int ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::classifyAndReadBlock<false, true>(this=0x00007fffffffa8a8, read_bucket=29) at block_permutation.hpp:69:41
frame #5: 0x0000000000448403 mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::permuteBlocks<false, true>(this=0x00007fffffffa8a8) at block_permutation.hpp:137:31
frame #6: 0x0000000000447b4d mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, bucket_start=0x0000000801371000, shared=0x0000000801371000, my_id=0, num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*, long*, ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData*, int, int) at partitioning.hpp:104:9
frame #7: 0x00000000004475fb mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, shared=0x0000000801371000, num_threads=4, task_sorter=0x00007fffffffaed8)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData&, int, ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&) at parallel.hpp:114:26
frame #8: 0x0000000000447487 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffadd8, my_id=0, num_threads=4)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
frame #9: 0x00000000003ef65f mmseqs`::.omp_outlined._debug__.54(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) &) at thread_pool.hpp:95:13
frame #10: 0x00000000003ef695 mmseqs`::.omp_outlined..55(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) at thread_pool.hpp:94:1
frame #11: 0x000000080098d523 libomp.so`__kmp_invoke_microtask + 147
frame #12: 0x0000000800968332 libomp.so`___lldb_unnamed_symbol498$$libomp.so + 370
frame #13: 0x0000000800963b3f libomp.so`__kmp_fork_call + 7551
frame #14: 0x000000080093cfb6 libomp.so`__kmpc_fork_call + 310
frame #15: 0x00000000004473a1 mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffaea8, func=0x00007fffffffadd8, num_threads=4)<ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)>(ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
frame #16: 0x000000000043e399 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffaea8, begin=0x00000008012ffc40, end=0x00000008013647d0)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*) at parallel.hpp:189:9
frame #17: 0x000000000043dfd2 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84), num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), int) at ips4o.hpp:128:9
frame #18: 0x000000000040d66a mmseqs`void ips4o::parallel::sort<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84))(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)) at ips4o.hpp:137:5
frame #19: 0x00000000003ef124 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffc718, isSortedById=true) at DBReader.cpp:403:9
frame #20: 0x000000000044adae mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffc718, accessType=8) at DBReader.cpp:185:9
frame #21: 0x000000000059f427 mmseqs`doswap(par=0x000000080121f1c0, isGeneralMode=true) at swapresults.cpp:49:22
frame #22: 0x00000000005a1f2e mmseqs`swapdb(argc=8, argv=0x00007fffffffd2c8, command=0x0000000801269b00) at swapresults.cpp:353:12
frame #23: 0x000000000038a19f mmseqs`runCommand(p=0x0000000801269b00, argc=8, argv=0x00007fffffffd2c8) at Application.cpp:38:18
frame #24: 0x000000000038b416 mmseqs`main(argc=10, argv=0x00007fffffffd2b8) at Application.cpp:196:9
frame #25: 0x0000000000389280 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
(lldb)
* thread #1, name = 'mmseqs'
* frame #0: 0x0000000800bea528 libc.so.7`__sys__umtx_op + 8
frame #1: 0x0000000000803044 mmseqs`__atomic_fetch_sub_16 [inlined] lock(l=0x000000000080c2e0) at atomic.c:72:5
frame #2: 0x000000000080301e mmseqs`__atomic_fetch_sub_16(ptr=0x00000008013723b0, val=1180591620717411303424, model=<unavailable>) at atomic.c:342
frame #3: 0x0000000000449273 mmseqs`std::__1::pair<long, long> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00000008013723b0)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::BucketPointers::decRead<true>() at bucket_pointers.hpp:106:28
frame #4: 0x0000000000449482 mmseqs`int ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::classifyAndReadBlock<false, true>(this=0x00007fffffffa8a8, read_bucket=29) at block_permutation.hpp:69:41
frame #5: 0x0000000000448403 mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::permuteBlocks<false, true>(this=0x00007fffffffa8a8) at block_permutation.hpp:137:31
frame #6: 0x0000000000447b4d mmseqs`std::__1::pair<int, bool> ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, bucket_start=0x0000000801371000, shared=0x0000000801371000, my_id=0, num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::partition<true>(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*, long*, ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData*, int, int) at partitioning.hpp:104:9
frame #7: 0x00000000004475fb mmseqs`void ips4o::detail::Sorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::parallelPrimary<ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&>(this=0x00007fffffffa8a8, begin=0x00000008012ffc40, end=0x00000008013647d0, shared=0x0000000801371000, num_threads=4, task_sorter=0x00007fffffffaed8)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::SharedData&, int, ips4o::SequentialSorter<ips4o::ExtendedConfig<std::__1::__wrap_iter<ips4o::detail::ParallelTask*>, std::__1::greater<void>, ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >&) at parallel.hpp:114:26
frame #8: 0x0000000000447487 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffadd8, my_id=0, num_threads=4)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)::operator()(int, int) const at parallel.hpp:193:24
frame #9: 0x00000000003ef65f mmseqs`::.omp_outlined._debug__.54(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) &) at thread_pool.hpp:95:13
frame #10: 0x00000000003ef695 mmseqs`::.omp_outlined..55(.global_tid.=0x00007fffffffa9e0, .bound_tid.=0x00007fffffffa9d8, func=0x00007fffffffadd8) at thread_pool.hpp:94:1
frame #11: 0x000000080098d523 libomp.so`__kmp_invoke_microtask + 147
frame #12: 0x0000000800968332 libomp.so`___lldb_unnamed_symbol498$$libomp.so + 370
frame #13: 0x0000000800963b3f libomp.so`__kmp_fork_call + 7551
frame #14: 0x000000080093cfb6 libomp.so`__kmpc_fork_call + 310
frame #15: 0x00000000004473a1 mmseqs`void ips4o::OpenMPThreadPool::operator(this=0x00007fffffffaea8, func=0x00007fffffffadd8, num_threads=4)<ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator()(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*)::'lambda'(int, int)>(ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool>&&, int) at thread_pool.hpp:94:1
frame #16: 0x000000000043e399 mmseqs`ips4o::ParallelSorter<ips4o::ExtendedConfig<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, ips4o::OpenMPThreadPool> >::operator(this=0x00007fffffffaea8, begin=0x00000008012ffc40, end=0x00000008013647d0)(DBReader<unsigned int>::Index*, DBReader<unsigned int>::Index*) at parallel.hpp:189:9
frame #17: 0x000000000043dfd2 mmseqs`void ips4o::parallel::sort<ips4o::Config<true, 16l, 16l, 2048l, long, 4096ul, 5l, 8, 4l, 20, 7>, DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84), num_threads=4)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&), int) at ips4o.hpp:128:9
frame #18: 0x000000000040d66a mmseqs`void ips4o::parallel::sort<DBReader<unsigned int>::Index*, bool (*)(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)>(begin=0x00000008012ffc40, end=0x00000008013647d0, comp=(mmseqs`DBReader<unsigned int>::Index::compareByOffset(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&) at DBReader.h:84))(DBReader<unsigned int>::Index const&, DBReader<unsigned int>::Index const&)) at ips4o.hpp:137:5
frame #19: 0x00000000003ef124 mmseqs`DBReader<unsigned int>::sortIndex(this=0x00007fffffffc718, isSortedById=true) at DBReader.cpp:403:9
frame #20: 0x000000000044adae mmseqs`DBReader<unsigned int>::open(this=0x00007fffffffc718, accessType=8) at DBReader.cpp:185:9
frame #21: 0x000000000059f427 mmseqs`doswap(par=0x000000080121f1c0, isGeneralMode=true) at swapresults.cpp:49:22
frame #22: 0x00000000005a1f2e mmseqs`swapdb(argc=8, argv=0x00007fffffffd2c8, command=0x0000000801269b00) at swapresults.cpp:353:12
frame #23: 0x000000000038a19f mmseqs`runCommand(p=0x0000000801269b00, argc=8, argv=0x00007fffffffd2c8) at Application.cpp:38:18
frame #24: 0x000000000038b416 mmseqs`main(argc=10, argv=0x00007fffffffd2b8) at Application.cpp:196:9
frame #25: 0x0000000000389280 mmseqs`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
Ah that's interesting. In the preset flags we have this:
elseif (HAVE_SSE2)
set(MMSEQS_ARCH "${MMSEQS_ARCH} -msse2")
set(DISABLE_IPS4O 1)
It seems I had a reason for the the DISABLE_IPS4O
here, beyond reducing requirements. This disables this fast sorting library and falls back to a different slightly slower one. You should pass -DDISABLE_IPS4O=1
to cmake
.
IPS4o requires either 16 byte compare exchange instructions (enabled by -mcx16
) or slower slower implementation from libatomic
. For lowest common denominator compilation it would be a good idea to disable anyway.
That seems to have done it. Nice work!
I'm still not clear on why it was working on your FreeBSD VM or why it works with GCC. From what I can tell, CMPXCHG16B was only lacking on VERY early AMD64 architectures. My hardware is old, but not that old.
I am not sure why. This sorting library is also a bit fickle on uncommon architectures (Power and Z, though MMseqs2 doesn't 100% work on Z yet anyway) and I've explicitly disabled it on those.
Can you post the output of ldd /usr/local/bin/mmseqs
and make clean build
in wip/mmseqs2? I wonder if my build is picking up some optional dependency that yours is not. I'm guessing you don't have many packages installed on the VM. Thanks...
# ldd /usr/local/bin/mmseqs
/usr/local/bin/mmseqs:
libthr.so.3 => /lib/libthr.so.3 (0x80066c000)
libm.so.5 => /lib/libm.so.5 (0x800699000)
libz.so.6 => /lib/libz.so.6 (0x8006cc000)
libbz2.so.4 => /usr/lib/libbz2.so.4 (0x8006e8000)
libomp.so => /usr/lib/libomp.so (0x8006fe000)
libc++.so.1 => /usr/lib/libc++.so.1 (0x8007c5000)
libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x800897000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x8008ba000)
libc.so.7 => /lib/libc.so.7 (0x8008d3000)
And zstd was picked up during cmake
:
-- Found ZSTD: /usr/local/lib/libzstd.a
This looks pretty complete. I don't remember anything else that we might be missing.
The GCC10 build picks up libatomic, which may at least explain the GCC vs clang difference.
# ldd /usr/local/bin/mmseqs
/usr/local/bin/mmseqs:
libthr.so.3 => /lib/libthr.so.3 (0x800ace000)
libatomic.so.1 => /usr/local/lib/gcc10/libatomic.so.1 (0x800afc000)
libz.so.6 => /lib/libz.so.6 (0x800d03000)
libbz2.so.4 => /usr/lib/libbz2.so.4 (0x800d1f000)
libstdc++.so.6 => /usr/local/lib/gcc10/libstdc++.so.6 (0x800d35000)
libm.so.5 => /lib/libm.so.5 (0x80111b000)
libgomp.so.1 => /usr/local/lib/gcc10/libgomp.so.1 (0x80114e000)
libgcc_s.so.1 => /usr/local/lib/gcc10/libgcc_s.so.1 (0x80138b000)
libc.so.7 => /lib/libc.so.7 (0x8015a3000)
libdl.so.1 => /usr/lib/libdl.so.1 (0x8019b4000)
I think we're set now. Thanks again!
FYI:
MMseqs2 has been committed to the FreeBSD ports collection. It might be helpful to users if you could post a message like the following on your website:
Thanks!
MMseqs2 can be installed on FreeBSD via the FreeBSD ports system.
To install via the binary package, simply run:
This will very quickly install a prebuilt binary using only highly-portable optimizations, much like apt, yum, etc.
FreeBSD ports can just as easily be built and installed from source, although it will take longer (for the computer, not for you):
Building from source allows installing to a different prefix, compiling with native optimizations, and in some cases, building with non-default options such as different compilers or dependencies. For example, adding
to /etc/make.conf will cause ports built from source to use all native optimizations known to the compiler for the local CPU, resulting in faster but less portable binaries.
To report issues with a FreeBSD port, please submit a PR at:
For more information, visit https://www.freebsd.org/ports/index.html.