rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Update workflow for portable macOS binary (remove need to install gcc@11), fixes #21 #47

Closed blake-riley closed 1 week ago

blake-riley commented 1 year ago

Here's (finally) some changes that alter the GitHub workflow to generate a portable macOS binary. By portable, I mean that it has dynamic links to only libc++.1.dylib and libSystem.B.dylib: I believe these libraries are bundled with all macOS distributions since 10.9 (when Apple switched from gcc to clang).

The workflow.yml for macOS and Linux now also print some more data as the runner builds the artifact --- this helped me work out what was going on in the runner as I was making changes. IMO, this should aid in future debugging of runners.

This commit history does include a bunch of (failed) attempts to continue using the GNU compilers on macOS. I'm unsure if including the dead-ends is useful & illustrative for anyone, it's mostly cathartic for me.

I've tested the compiled binary on a macOS 12.3 machine of mine that doesn't have g++-11 installed. An alignment with the default settings looks fine, and shows an expected multithreading speedup vs -threads 1.

Feel free to request changes / comment on things you'd like more explanation for, Robert! (Commit messages also contain more info).

shbrainard commented 1 year ago

I am running macOS 12.6.3 and get the following error when I run make from within src

c++ -DNDEBUG -pthread -Xpreprocessor -fopenmp -O3 -ffast-math -std=c++11 -c -o Darwin/usorter.o usorter.cpp
Warning: please specify an OpenMP library in LIBS.
c++ -O3 -pthread -lpthread  Darwin/addconfseq.o Darwin/align.o Darwin/alignpairflat.o Darwin/allocflat.o Darwin/alnalnsflat.o Darwin/alnmsasflat.o Darwin/alnmsasflat3.o Darwin/alpha.o Darwin/alpha3.o Darwin/assertsameseqs.o Darwin/buildposterior3flat.o Darwin/buildpostflat.o Darwin/bwdflat3.o Darwin/calcalnflat.o Darwin/calcalnscoreflat.o Darwin/calcalnscoresparse.o Darwin/calcposteriorflat.o Darwin/colscoreefa.o Darwin/consflat.o Darwin/conspairflat.o Darwin/defaulthmmparams.o Darwin/derep.o Darwin/diagbox.o Darwin/disperse.o Darwin/dividetree.o Darwin/eacluster.o Darwin/eadistmx.o Darwin/eadistmxmsas.o Darwin/eesort.o Darwin/efabestcols.o Darwin/efabestconf.o Darwin/efaexplode.o Darwin/efastats.o Darwin/ensemble.o Darwin/fa2efa.o Darwin/fasta.o Darwin/fasta2.o Darwin/fwdflat3.o Darwin/getconsseq.o Darwin/getpairs.o Darwin/getpostpairsalignedflat.o Darwin/globalinputms.o Darwin/guidetreejoinorder.o Darwin/heatmapcolors.o Darwin/help.o Darwin/hmmdump.o Darwin/hmmparams.o Darwin/jalview.o Darwin/jointrees.o Darwin/letterconf.o Darwin/letterconfhtml.o Darwin/logaln.o Darwin/logdistmx.o Darwin/logmx.o Darwin/main.o Darwin/make_a2m.o Darwin/maxcc.o Darwin/mpcflat.o Darwin/msa.o Darwin/msa2.o Darwin/msastats.o Darwin/multisequence.o Darwin/mysparsemx.o Darwin/myutils.o Darwin/pairhmm.o Darwin/permutetree.o Darwin/perturbhmm.o Darwin/pprog.o Darwin/pprog2.o Darwin/pprogt.o Darwin/probcons.o Darwin/progalnflat.o Darwin/project.o Darwin/qscore.o Darwin/qscore2.o Darwin/qscoreefa.o Darwin/qscorer.o Darwin/quarts.o Darwin/randomchaintree.o Darwin/refineflat.o Darwin/relabel.o Darwin/relaxflat.o Darwin/resample.o Darwin/seb8.o Darwin/seq.o Darwin/sequence.o Darwin/setprobconsparams.o Darwin/stripgappycols.o Darwin/stripgappyrows.o Darwin/super4.o Darwin/super5.o Darwin/testfb.o Darwin/testlog.o Darwin/testscoretype.o Darwin/textfile.o Darwin/totalprobflat.o Darwin/tracebackflat.o Darwin/transaln.o Darwin/transq.o Darwin/tree.o Darwin/tree2.o Darwin/tree4.o Darwin/treefromfile.o Darwin/treeperm.o Darwin/treesplitter.o Darwin/treesubsetnodes.o Darwin/treetofile.o Darwin/trimtoref.o Darwin/trimtorefefa.o Darwin/uclust.o Darwin/upgma5.o Darwin/usage.o Darwin/usorter.o -o Darwin/muscle
Undefined symbols for architecture x86_64:
  "___kmpc_critical", referenced from:
      Die_(char const*, ...) in myutils.o
  "___kmpc_end_critical", referenced from:
      Die_(char const*, ...) in myutils.o
  "___kmpc_for_static_fini", referenced from:
      _.omp_outlined. in consflat.o
      _.omp_outlined. in eacluster.o
      _.omp_outlined. in eadistmx.o
      _.omp_outlined. in eesort.o
      _.omp_outlined. in getpostpairsalignedflat.o
      _.omp_outlined. in mpcflat.o
  "___kmpc_for_static_init_4", referenced from:
      _.omp_outlined. in consflat.o
      _.omp_outlined. in eacluster.o
      _.omp_outlined. in eadistmx.o
      _.omp_outlined. in eesort.o
      _.omp_outlined. in getpostpairsalignedflat.o
      _.omp_outlined. in mpcflat.o
  "___kmpc_fork_call", referenced from:
      MPCFlat::ConsIter(unsigned int) in consflat.o
      EACluster::GetBestCentroid(unsigned int, float, float&) in eacluster.o
      CalcEADistMx(__sFILE*, MultiSequence*, std::__1::vector<std::__1::vector<float, std::__1::allocator<float> >, std::__1::allocator<std::__1::vector<float, std::__1::allocator<float> > > >&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >*) in eadistmx.o
      cmd_eesort() in eesort.o
      GetPostPairsAlignedFlat(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MultiSequence const&, MultiSequence const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >&) in getpostpairsalignedflat.o
      MPCFlat::CalcPosteriors() in mpcflat.o
      MPCFlat::Run_Super4(MultiSequence*) in mpcflat.o
      ...
  "___kmpc_global_thread_num", referenced from:
      MPCFlat::ConsIter(unsigned int) in consflat.o
      EACluster::GetBestCentroid(unsigned int, float, float&) in eacluster.o
      CalcEADistMx(__sFILE*, MultiSequence*, std::__1::vector<std::__1::vector<float, std::__1::allocator<float> >, std::__1::allocator<std::__1::vector<float, std::__1::allocator<float> > > >&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >*) in eadistmx.o
      cmd_eesort() in eesort.o
      GetPostPairsAlignedFlat(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MultiSequence const&, MultiSequence const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >&) in getpostpairsalignedflat.o
      MPCFlat::CalcPosteriors() in mpcflat.o
      MPCFlat::Run_Super4(MultiSequence*) in mpcflat.o
      ...
  "___kmpc_push_num_threads", referenced from:
      MPCFlat::ConsIter(unsigned int) in consflat.o
      EACluster::GetBestCentroid(unsigned int, float, float&) in eacluster.o
      CalcEADistMx(__sFILE*, MultiSequence*, std::__1::vector<std::__1::vector<float, std::__1::allocator<float> >, std::__1::allocator<std::__1::vector<float, std::__1::allocator<float> > > >&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >*) in eadistmx.o
      cmd_eesort() in eesort.o
      GetPostPairsAlignedFlat(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MultiSequence const&, MultiSequence const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<MySparseMx*, std::__1::allocator<MySparseMx*> >&) in getpostpairsalignedflat.o
      MPCFlat::CalcPosteriors() in mpcflat.o
      MPCFlat::Run_Super4(MultiSequence*) in mpcflat.o
      ...
  "_omp_get_max_threads", referenced from:
      GetRequestedThreadCount() in myutils.o
  "_omp_get_thread_num", referenced from:
      myvstrprintf(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, char const*, __va_list_tag*) in myutils.o
      GetThreadIndex() in myutils.o
  "_omp_init_lock", referenced from:
      __GLOBAL__sub_I_alnmsasflat.cpp in alnmsasflat.o
      __GLOBAL__sub_I_consflat.cpp in consflat.o
      __GLOBAL__sub_I_eacluster.cpp in eacluster.o
      __GLOBAL__sub_I_eadistmx.cpp in eadistmx.o
      __GLOBAL__sub_I_eesort.cpp in eesort.o
      __GLOBAL__sub_I_getpostpairsalignedflat.cpp in getpostpairsalignedflat.o
      __GLOBAL__sub_I_mpcflat.cpp in mpcflat.o
      ...
  "_omp_set_lock", referenced from:
      _.omp_outlined. in consflat.o
      _.omp_outlined. in eacluster.o
      _.omp_outlined. in eadistmx.o
      _.omp_outlined. in eesort.o
      _.omp_outlined. in getpostpairsalignedflat.o
      _.omp_outlined. in mpcflat.o
  "_omp_unset_lock", referenced from:
      _.omp_outlined. in consflat.o
      _.omp_outlined. in eacluster.o
      _.omp_outlined. in eadistmx.o
      _.omp_outlined. in eesort.o
      _.omp_outlined. in getpostpairsalignedflat.o
      _.omp_outlined. in mpcflat.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Darwin/muscle] Error 1

I am on the dev branch of @blake-riley 's fork of this repo, the Makefile looks like this:

(base) ➜  src git:(dev) ✗ cat Makefile
# The $(OS) variable is the o/s name returned by uname, which is
# used as the sub-directory name under src/ where object files
# and the executable are stored. This allows several target
# operating systems in the same directory structure.
# Typical values are:

#   Platform    Value of $(OS)
#   --------    --------------
#   Linux       linux
#   Mac OSX     Darwin
#   Cygwin      CYGWIN_NT-10.0

# Building on Mac OSX is challenging because Apple does not support gcc or
# the OMP threading library. Hacks to install gcc and OMP vary by OSX release.
# This Makefile works with the AWS Catalina v10.15.7 AMI. With this AMI,
# running 'brew install gcc' currently installs gcc v11.

OS := $(shell uname)

CPPFLAGS += -DNDEBUG -pthread

# Detect if CXX is Apple clang, use custom preprocessor call
COMPILER_VERSION := $(shell $(CXX) --version)
ifneq '' '$(findstring Apple clang, $(COMPILER_VERSION))'
    CPPFLAGS += -Xpreprocessor -fopenmp
else
    CPPFLAGS += -fopenmp
endif

# If Darwin, then -fopenmp won't work in linker stage appropriately.
# (Apple-ld is picky.)
# As such, the user must specify an OpenMP implementation in LIBS.
# We check this (quick-and-dirty) to help them out.
ifneq '' '$(findstring Darwin, $(OS))'
    ifeq '' '$(findstring omp, $(LIBS))'
        maybe_missing_omp=1
    endif
else
    LDFLAGS += -fopenmp
endif

# Add default flags
CXXFLAGS += -O3 -ffast-math -std=c++11
LDFLAGS += -O3 -pthread -lpthread

HDRS := $(shell echo *.h)
OBJS := $(shell echo *.cpp | sed "-es/^/$(OS)\//" | sed "-es/ / $(OS)\//g" | sed "-es/\.cpp/.o/g")
SRCS := $(shell ls *.cpp *.h)

.PHONY: clean

$(OS)/muscle : gitver.txt $(OS)/ $(OBJS)
    @if [ -n "${maybe_missing_omp}" ]; then \
        echo "Warning: please specify an OpenMP library in LIBS."; \
    fi

    $(CXX) $(LDFLAGS) $(LIBS) $(OBJS) -o $@

    @# Warning: do not add -d option to strip, this is not portable
    strip $(OS)/muscle

gitver.txt : $(SRCS)
    bash ./gitver.bash

$(OS)/ :
    mkdir -p $(OS)/

$(OS)/%.o : %.cpp $(HDRS)
    $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $<

clean:
    rm -rf gitver.txt $(OS)/

Thoughts? Sorry if I'm doing something stupid.

blake-riley commented 1 year ago

Hi Scott: Looks like you've skipped over a key message:

Warning: please specify an OpenMP library in LIBS.

The Makefile is trying to be a bit generic (so it can work with both GNU and LLVM compilers, on both Linux and macOS). As such, you've gotta help it with where to pull the libraries from (particularly on macOS).

The solution to this (for a static build) is in the last commit in this PR, in the build_osx.yml runner script (how GitHub builds the binaries for this project).

https://github.com/rcedgar/muscle/blob/d28a33ca7c1a532483732556d1aabeff56ac5822/.github/workflows/build_osx.yml#L37-L40

This runner script is trying to build muscle that's statically linked against libomp. Since you're building on your personal macOS system (and you've asked about a brew recipe before, so I assume you're running homebrew), I recommend just using a dynamic library here.

You should be able to use the above 4 lines. Replace libomp.a with libomp.dylib to dynamically link against libomp (which is what the brew recipe will do).

shbrainard commented 1 year ago

Awesome, yes, I thought it was something stupid :-)

I do have libomp installed with brew, and running:

(base) ➜  src git:(dev) ✗ env CXXFLAGS="-I$(brew --prefix libomp)/include" LIBS="$(brew --prefix libomp)/lib/libomp.dylib" make
c++ -O3 -pthread -lpthread /usr/local/opt/libomp/lib/libomp.dylib Darwin/addconfseq.o Darwin/align.o Darwin/alignpairflat.o Darwin/allocflat.o Darwin/alnalnsflat.o Darwin/alnmsasflat.o Darwin/alnmsasflat3.o Darwin/alpha.o Darwin/alpha3.o Darwin/assertsameseqs.o Darwin/buildposterior3flat.o Darwin/buildpostflat.o Darwin/bwdflat3.o Darwin/calcalnflat.o Darwin/calcalnscoreflat.o Darwin/calcalnscoresparse.o Darwin/calcposteriorflat.o Darwin/colscoreefa.o Darwin/consflat.o Darwin/conspairflat.o Darwin/defaulthmmparams.o Darwin/derep.o Darwin/diagbox.o Darwin/disperse.o Darwin/dividetree.o Darwin/eacluster.o Darwin/eadistmx.o Darwin/eadistmxmsas.o Darwin/eesort.o Darwin/efabestcols.o Darwin/efabestconf.o Darwin/efaexplode.o Darwin/efastats.o Darwin/ensemble.o Darwin/fa2efa.o Darwin/fasta.o Darwin/fasta2.o Darwin/fwdflat3.o Darwin/getconsseq.o Darwin/getpairs.o Darwin/getpostpairsalignedflat.o Darwin/globalinputms.o Darwin/guidetreejoinorder.o Darwin/heatmapcolors.o Darwin/help.o Darwin/hmmdump.o Darwin/hmmparams.o Darwin/jalview.o Darwin/jointrees.o Darwin/letterconf.o Darwin/letterconfhtml.o Darwin/logaln.o Darwin/logdistmx.o Darwin/logmx.o Darwin/main.o Darwin/make_a2m.o Darwin/maxcc.o Darwin/mpcflat.o Darwin/msa.o Darwin/msa2.o Darwin/msastats.o Darwin/multisequence.o Darwin/mysparsemx.o Darwin/myutils.o Darwin/pairhmm.o Darwin/permutetree.o Darwin/perturbhmm.o Darwin/pprog.o Darwin/pprog2.o Darwin/pprogt.o Darwin/probcons.o Darwin/progalnflat.o Darwin/project.o Darwin/qscore.o Darwin/qscore2.o Darwin/qscoreefa.o Darwin/qscorer.o Darwin/quarts.o Darwin/randomchaintree.o Darwin/refineflat.o Darwin/relabel.o Darwin/relaxflat.o Darwin/resample.o Darwin/seb8.o Darwin/seq.o Darwin/sequence.o Darwin/setprobconsparams.o Darwin/stripgappycols.o Darwin/stripgappyrows.o Darwin/super4.o Darwin/super5.o Darwin/testfb.o Darwin/testlog.o Darwin/testscoretype.o Darwin/textfile.o Darwin/totalprobflat.o Darwin/tracebackflat.o Darwin/transaln.o Darwin/transq.o Darwin/tree.o Darwin/tree2.o Darwin/tree4.o Darwin/treefromfile.o Darwin/treeperm.o Darwin/treesplitter.o Darwin/treesubsetnodes.o Darwin/treetofile.o Darwin/trimtoref.o Darwin/trimtorefefa.o Darwin/uclust.o Darwin/upgma5.o Darwin/usage.o Darwin/usorter.o -o Darwin/muscle

Seems like it succeed. Running muscle returns:

(base) ➜  Darwin git:(dev) ✗ ./muscle

muscle 5.2.osx64 [d28a33]  17.2Gb RAM, 4 cores
Built Jan 29 2023 13:28:56
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Thanks again for all your help with this!

rcedgar commented 1 week ago

Closing this PR, we are implementing new workflow in dev branch.