rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
188 stars 22 forks source link

Failing to compile 5.1 on Linux with GCC 11 #22

Closed V-Z closed 1 year ago

V-Z commented 2 years ago

Hello, I'm sorry, but I fail to to compile Muscle 5.1 on Linux with GCC 11.2.1:

This is for Git snaphsot:

$ make
bash ./gitver.bash
"7630cd"
mkdir -p Linux/
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/addconfseq.o addconfseq.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/align.o align.cpp
...
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/usage.o usage.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/usorter.o usorter.cpp
g++  -O3 -fopenmp -pthread -lpthread -static Linux/addconfseq.o Linux/align.o Linux/alignpairflat.o Linux/allocflat.o Linux/alnalnsflat.o Linux/alnmsasflat.o Linux/alnmsasflat3.o Linux/alpha.o Linux/alpha3.o Linux/assertsameseqs.o Linux/buildposterior3flat.o Linux/buildpostflat.o Linux/bwdflat3.o Linux/calcalnflat.o Linux/calcalnscoreflat.o Linux/calcalnscoresparse.o Linux/calcposteriorflat.o Linux/colscoreefa.o Linux/consflat.o Linux/conspairflat.o Linux/defaulthmmparams.o Linux/derep.o Linux/diagbox.o Linux/disperse.o Linux/dividetree.o Linux/eacluster.o Linux/eadistmx.o Linux/eadistmxmsas.o Linux/eesort.o Linux/efabestcols.o Linux/efabestconf.o Linux/efaexplode.o Linux/efastats.o Linux/ensemble.o Linux/fasta.o Linux/fasta2.o Linux/fa2efa.o Linux/fwdflat3.o Linux/getconsseq.o Linux/getpairs.o Linux/getpostpairsalignedflat.o Linux/globalinputms.o Linux/guidetreejoinorder.o Linux/heatmapcolors.o Linux/help.o Linux/hmmdump.o Linux/hmmparams.o Linux/jalview.o Linux/jointrees.o Linux/letterconf.o Linux/letterconfhtml.o Linux/logaln.o Linux/logdistmx.o Linux/logmx.o Linux/main.o Linux/make_a2m.o Linux/maxcc.o Linux/mpcflat.o Linux/msa.o Linux/msastats.o Linux/msa2.o Linux/multisequence.o Linux/mysparsemx.o Linux/myutils.o Linux/pairhmm.o Linux/permutetree.o Linux/perturbhmm.o Linux/pprog.o Linux/pprogt.o Linux/pprog2.o Linux/probcons.o Linux/progalnflat.o Linux/project.o Linux/qscore.o Linux/qscoreefa.o Linux/qscorer.o Linux/qscore2.o Linux/quarts.o Linux/randomchaintree.o Linux/refineflat.o Linux/relabel.o Linux/relaxflat.o Linux/resample.o Linux/seb8.o Linux/seq.o Linux/sequence.o Linux/setprobconsparams.o Linux/stripgappycols.o Linux/stripgappyrows.o Linux/super4.o Linux/super5.o Linux/testfb.o Linux/testlog.o Linux/testscoretype.o Linux/textfile.o Linux/totalprobflat.o Linux/tracebackflat.o Linux/transaln.o Linux/transq.o Linux/tree.o Linux/treefromfile.o Linux/treeperm.o Linux/treesplitter.o Linux/treesubsetnodes.o Linux/treetofile.o Linux/tree2.o Linux/tree4.o Linux/trimtoref.o Linux/trimtorefefa.o Linux/uclust.o Linux/upgma5.o Linux/usage.o Linux/usorter.o -o Linux/muscle
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: cannot find -lm
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
make: *** [Makefile:41: Linux/muscle] Error 1

Compiling released version ends with same error, just the beginning is different:

$ make
bash ./gitver.bash
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
""
mkdir -p Linux/
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/addconfseq.o addconfseq.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/align.o align.cpp
...

It looks similar like an error I already got (discussed in #4), and looks bit low-level, but I'm sure I have all compilation tools correctly installed. I might be missing some dependency (although I think I checked everything), but I fail to find out which one it could be... When I succeed, I'll create package RPM for openSUSE Linux.

rcedgar commented 2 years ago

It builds ok for me in a clean Ubuntu 22.04 install in this docker container:

FROM ubuntu:22.04
RUN apt-get update -y

# Avoid interactive prompts for geographic area / timezone
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

RUN apt-get install -y vim
RUN apt-get install -y git

RUN apt-get update -y
RUN apt-get install -y build-essential
RUN apt-get install -y gcc-11 g++-11
rcedgar commented 2 years ago

fatal: not a git repository

You would avoid this error if you got the source using a git clone. The build should succeed from a copy of the source without the repository (.git subdirectory), but you should clone the repo so that the git hash is embedded in the binary, it is reported by muscle --version.

V-Z commented 2 years ago

Sorry for late reply. Compilation worked on Debian 10, but it complained about static libraries, so I removed lines 30-32 in Makefile and then it works. See e.g. https://en.opensuse.org/openSUSE:Packaging_guidelines#Static_Libraries I also created openSUSE RPM package (later I'll submit it into scientific repository). So far I have the package without Git (for packages it's better to avoid cloning Git, rather keep some released version). BTW, is output of muscle --help intentional? :-)

--- src/Makefile.orig   2022-01-20 12:52:43.258710577 +0100
+++ src/Makefile    2022-01-20 14:05:05.059594138 +0100
@@ -27,9 +27,6 @@
 CXXFLAGS := $(CXXFLAGS) -O3 -fopenmp -ffast-math

 LDFLAGS := $(LDFLAGS) -O3 -fopenmp -pthread -lpthread
-ifeq ($(OS),Linux)
-    LDFLAGS += -static
-endif

 HDRS := $(shell echo *.h)
 OBJS := $(shell echo *.cpp | sed "-es/^/$(OS)\//" | sed "-es/ / $(OS)\//g" | sed "-es/\.cpp/.o/g")
rcedgar commented 2 years ago

Thanks for the feedback! For pre-built binaries, I've found in practice that static linking is much more robust against variations in the details of the user's o/s. For my education, can you please explain how does the package ensure that required dynamic libraries are present and compatible with the binary?

Maybe I should do static linking for pre-built binaries, but dynamic linking in the default Makefile, do you think that would be better? This makes the CI integration more complicated; I was thinking that the binaries posted in the release should be built using github actions, but those should be static to avoid dependency issues.

On MacOS, it seems to be impossible to do static linking, but then my pre-built binary fails unless the user installs gcc-v11, do you have a suggestion how to do this better?

V-Z commented 2 years ago

I don't use macOS (exclusively Linux), so I can't comment on that, neither GitHub CI (I don't develop SW) - I "just" package for Linux, so I'll try to explain this. In Linux packaging, we try to avoid static libraries as much as possible, so that each library is presented only once and managed centrally by package manager. It has multiple advantages - when there is an update, only single library is updated and other packages are recompiled. This is responsibility of packaging infrastructure, like OBS which harbors most of packages for openSUSE. Packages have defined their dependencies (what is required for their compilation) and when something changes, all depending packages are recompiled and users get updates. It has also security implications. If there is security update of some library, You automatically get everything updated. Of course, it is possible to have multiple versions of some library (package then require e.g. particular minimal version of some library). On technical side, Linux distributions try to automatize everything as much as possible. So during compilation, package is linked against present library. Dependencies then describe what package requires to run. This ensures the presence part. Of course, it is possible to have multiple versions of some library (package then require e.g. particular minimal version of some library). It also means that the package must be created for every distribution to ensure usage of correct libraries and various standards. Again, packaging infrastructure helps with this a lot, and packaging instructions use to be simple, see e.g. Your Muscle. Regarding the compatibility part, released Linux distributions (e.g. Debian 10, openSUSE 15, ...) use to have policies requiring that library updates don't change API, so that You should experience no problems in this case. Of course, when there is big upgrade (e.g. from Debian 9 to 10), distribution is e.g. starting to use new GCC (12 was released recently), everything must be recompiled and if some software fails, it must be fixed.

rcedgar commented 2 years ago

Good info, thanks. Is the manual edit to the Makefile acceptable to you as the packager, or do I need to provide a package-compatible Makefile variant in the repo?

V-Z commented 2 years ago

Such little edits of Makefile are very common. Of course, if anyone could compile/package without any edit, it'd be nice. But... among 35 packages in my own repository I have 7 patches... :-) If there would be an option to put into Makefile condition, that static linking would be used only on GitHub CI, I'd be the best, but I don't know if it's feasible. Variant Makefile is more convenient for user, but, IMHO, description in README would be sufficient.

V-Z commented 2 years ago

If I may have side question... Is it supposed to be compiled also on 64 bit ARM? I wonder if it's crashing because it is not aimed for 64 bit ARM, or there is some issue elsewhere... :-)

rcedgar commented 2 years ago

Thanks for the link to the log, I see what the problem is and I can commit an update which I think will fix it. I try to write portable code that will compile on as many systems as possible, but it's limited how many I test in practice. I guess I could use AWS free tier to test Linux ARM; nobody has asked for that combination before now. This raises another packaging question -- if I post a patch so that it compiles on ARM, then do you care about whether or not I increment the version number / post a new release tag to github, or do you associate it with a git hash in which case my release numbering is irrelevant?

V-Z commented 2 years ago

Thank You for patching. OBS is very handy as it can at least verify compilation on various platforms, and also for various Linux distributions. It's slightly easier if there is version number increment (especially readability for human users), but referencing Git commit is also doable. Unless it's critical error, people try to package released (tagged) versions, and "ordinary" Git commits are considered unstable developmental versions, where You can't rely on stability etc. So tagged released versions are easier for orientation, but of course not every commit is worth of new version. :-)

rcedgar commented 2 years ago

k thanks I'll try OBS and post an update to this issue.

rcedgar commented 2 years ago

Hi @V-Z can you share the spec file you use so I can try it on OBS? Thanks!

V-Z commented 2 years ago

Sure: https://build.opensuse.org/package/show/home:vojtaeus/muscle You can branch it into Your home project if You like.

rcedgar commented 2 years ago

Hi @V-Z I was able to branch your spec and get it working in my own local project, but I'm struggling to update it to build the current main branch from github. I downloaded the current master into main.zip and changed "Source" like this: Source: https://github.com/rcedgar/muscle/archive/refs/heads/main.zip But I get this error from build --local-package:

+ /usr/bin/unzip -qq /home/abuild/rpmbuild/SOURCES/main.zip
[    2s] /var/tmp/rpm-tmp.hgDf3a: line 30: /usr/bin/unzip: No such file or directory

Help appreciated!

V-Z commented 2 years ago

@rcedgar I branched Your package, edited it and created push request. The edits are: 1) The source must point to actual source, so I changed it to point to Git master ZIP download. 2) I replaced the released tar.gz file by downloaded master zip file. You can any time replace the zip file by newly downloaded master copy. OBS takes the attached archive, not directly remote source, so You must always upload to OBS actual source archive. 3) The build environment is as limited as possible, so if You have ZIP archive, You must explicitly BulidRequires unzip. 4) I had to return the patch (build fails without it) 5) In %setup I added -n master-main, which says custom name of directory in unpacked source archive.

rcedgar commented 2 years ago

@V-Z I updated the rce-muscle-branch package and now as best I can tell with my limited understanding it builds packages successfully for all architectures starting from muscle-main with no patches. Please let me know if this looks ok to you, if so I will make github release 5.2. Muscle v5 needs a new manpage because the options are quite different from the earlier versions, what is the process for this? Thanks again for all your help and patience!

V-Z commented 2 years ago

Yes, @rcedgar, it looks alright now. :-) It'd be nice if new manpage would be part of planned 5.2 release. Otherwise some minor release like 5.2.1? Thank You for all Your work!

rcedgar commented 2 years ago

@v-z sorry I wasn't clear, I don't know how to format a manpage or where to post it in the github/suse repo? The manpage for muscle v3 was written by someone else, no idea when/how that happened.

V-Z commented 2 years ago

Ah, OK. As I'm not C/C++ programmer, so I don't know how much helpful I could be, but if You mean manpage as running man muscle (which now does nothing as compilation produces only single binary), this use to be installed in make install section into /usr/share/man/man1/muscle.1.gz (or other directory according to MANPATH). It's relatively simple text-based format, on UNIX commonly using groff (troff). Definitely it should be part of this repository.

rcedgar commented 2 years ago

That was very helpful, already more than I knew, but leads to another question -- I don't have an "install" section in the Makefile -- I know almost nothing about make, just the minimum to make it work, how do I do this? Please note I don't understand %build and %install in the spec file, I couldn't find an explanation of what they do in the documentation. Also, now I can't browse files in the web interface of openSUSE, it gives an error "Files could not be expanded: conflict in file muscle.spec" :-(

V-Z commented 2 years ago

I'm afraid Your knowledge about Makefile is larger than mine. :-) In the install section of Makefile You "just" copy files into desired system locations like /usr/bin/ for binaries, /etc/ for configuration files, /usr/ for libraries, manpages, etc. As it use to differ among UNIX systems (greetings to macOS;-), this use to be defined by variables, or user can select this as parameter when running make install. So I'd guess writing this section should be easy. First DuckDuckGo hit points to StackOverflow, so it indeed seams feasible. :-) Regarding OBS, I haven't seen error You see, but I guess it could be related to the fact that You submitted the package back to my repository, and I edited it little bit, so OBS might get too confused. The diff error looks like same lines are edited, preventing usable diff output. It depends if You have there some new edits (I don't see any), but simplest could be to delete the package and again branch mine. Or just create Your package independently and just copy my spec file. Spec files (and similarly for different Linux packaging systems) rather use macros instead of "plain" commands. %build and %install in the spec file mark sections of compilation process. In %make section the software is compiled (do everything needed to successfully run make) and in the %install section all produced files (here only single binary) are copied into desired places (defined by variables). There are a lot of macros... Although it's not strictly needed, it's highly recommended to use macro %make_build instead of "traditional" command make as the macro can pre-set variables, provide some parameters etc. according to OBS settings for particular distribution, so user doesn't have to deal with it manually. There are various such macros for different programming languages. I started openSUSE packaging with OBS tutorial (and I mainly use osc command). Yeah, I agree the wiki could have better structure... :-/

rcedgar commented 2 years ago

I was able to resolve the file conflicts, but after looking at the documentation links you sent I still have no clue how to put the pieces together. Let me ask the question this way -- if I include source for a manpage in groff format at a known location in my github repo do you know how to write the spec to ensure that manpage is installed correctly so that man muscle will show it?

GarryGippert commented 2 years ago

There is an apparent conceptual error, either in the Makefile bundled with the tar.gz file, or in my understanding of how to make and install the code. (Mac OSX, arch yields arm64).

After downloading, gunzip and tar xvf, the muscle-5.1/src/Makefile contains a reference to getver.bash, which runs a git command. But, because this was a downloaded - not git cloned - instance, the git command fails.

$ bash ./gitver.bash
fatal: not a git repository (or any of the parent directories): .git
sed: 1: "gitver.tmp": extra characters at the end of g command
""

The solution is to git clone git@github.com:rcedgar/muscle.git.

rcedgar commented 2 years ago

@GarryGippert please see #27, I believe I found and fixed this yesterday.

rcedgar commented 2 years ago

@V-Z please see above for open build question, thanks!

V-Z commented 2 years ago

@rcedgar Regarding manpages, they must be correctly installed via make install (so having correct section in Makefile) into $MANPATH. AFAIK it doesn't matter where it is in Your repository, just, AFAIK, it must be compressed by gzip (during compilation). Then, in %install section of Your spec file You have macro %make_install and in %files section You have %{_mandir}/*/* or %{_mandir}/man1/*. Commands in this section tell RPM what to do with compiled files, which to include in final package, what they are.

rcedgar commented 2 years ago

@V-Z thanks for this, but I still don't understand how to implement, e.g. I don't have an install section in my Makefile and don't know how to write one that works with OBS/RMP. Does the package you built install the binary correctly on a user machine? Can you post a link to an example OBS spec file that installs a manpage from a github repo?

V-Z commented 2 years ago

@rcedgar I don't recall case where man someapp would work without having install section in Makefile, but it's surely possible to install manpage manually in spec file. I just wonder how practical this is for compilation in another environments...? If You wish to have man muscle (IMHO it's not critically needed), why not to do it as most of people do? Let's say muscle.1 in groff is in repo root. I think (I haven't tried, but I'm sure it's easy) it'd be enough to compress it in %build section (gzip muscle.1), install it in %install section (something like install -m 644 muscle.1.gz %{buildroot}%{_mandir}/man1/) and then list it in %files section (%{_mandir}/*/* or %{_mandir}/man1/* or so). I can surely do this for You if You need. I don't remember spec file, which would install manpage directly from GitHub. Yes, my package works correctly:

muscle -h

muscle 5.2.linux64 [-]  24.5Gb RAM, 4 cores
Built Feb  6 2022 00:00:00
(C) Copyright 2004-2021 Robert C. Edgar.
...
muscle -align gunnera.fasta -output res.fsa

muscle 5.2.linux64 [-]  24.5Gb RAM, 4 cores
Built Feb  6 2022 00:00:00
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 22 seqs, avg length 741, max 770

00:00 8.1Mb  CPU has 4 cores, running 4 threads
00:08 237Mb   100.0% Calc posteriors
00:08 239Mb   100.0% Consistency (1/2)
00:08 239Mb   100.0% Consistency (2/2)
00:08 239Mb   100.0% UPGMA5
00:09 242Mb   100.0% Refining
rcedgar commented 2 years ago

"why not to do it as most of people do?" because I don't know what that means, actually I don't think there are established conventions in my world -- I've asked other scientific software developers, they all do things different ways and none of them are familiar with issues related to making packages, from their perspective someone else did the packaging without needing help from them. It' seems strange that it's so hard for you and me to understand each other, at a minimum this shows that the OBS documentation could be improved -- I've been working with extensively software development on *nux since the mid-1980s and I'm struggling here.

V-Z commented 2 years ago

Let's try differently...

rcedgar commented 2 years ago

Good discussion :-) Some more background from my perspective. Muscle v3.8 was developed in 2004. I posted the source code on my personal web site as "public domain" rather than a typical open source license. Since then, it has become one of the most widely-used software programs in science and has been cited by more than 44,000 peer-reviewed papers so far. There are *nux packages for muscle; simply typing muscleat a shell prompt typically offers the option of installing muscle if not already there. Once installed via the apt command, man muscle shows a manpage, at least on Ubuntu which is what I normally use. This integration was totally opaque to me; nobody asked me, it just happened because someone (some people?) took the initiative. I know almost nothing about packages, package managers such as apt or package databases; this stuff is more or less magic from a black box for me. From 2004 to 2021, I didn't work on muscle at all, no bugfixes (not needed) or new features. Last year, I developed a major re-write of muscle with many substantial improvements; this is muscle v5. Because muscle is so widely used, I felt it deserves my best effort to make sure that end-users are aware of the new version and make it as easy as possible to install and use. My specific goal here is to ensure that the muscle v3 packages out there are replaced by muscle v5. A man page must be included because the command line is different so the v3 manpage does not apply. I was trying to make your life easier, but so far it looks like I have failed :-) I think I should ask around more colleagues to find someone who has the knowledge to bridge the gaps between you and me. One follow-up question for now: how do I make sure that all package repositories with muscle are updated? Will your package cover all of them, or are there others I should be aware of? Thanks once again! Edit One point I forgot to mention -- OBS is useful for me because it provides a build service for architectures such as ARM that I don't usually use, this enables me to verify that the binary builds at least. Even better would be the ability to run a small test case to show that minimal functionality works, at a guess there is a way to hack this on OBS but that is obviously beyond my skills right now :-)

V-Z commented 2 years ago

Ah, OK, thanks for explanation. On openSUSE, I get for v. 3...

man muscle
No manual entry for muscle

...so someone created separated manpage for Debian/Ubuntu, but it wasn't send to upstream, neither spread into another distributions... So someone has it, someone hasn't. If it's not done centrally, someone's "local" work obviously isn't propagated... And yes, if muscle wouldn't be installed, I'd get hint how to search for it. I see Debian package (imported into Ubuntu and derivatives). Instead of RPM spec file (RedHat, Fedora, CentOS, Scientific Linux, openSUSE, ...) they use debian directory there and files like control and rules. I see some modifications there in the debian folder like patches or the added manual...

Edit to above: Dealing with manpage is basically same way what I suggested earlier --- create separate file and deal with it during making package. This will work well for Linux packagers, but I'm nor sure how convenient this would be for macOS (and Windows?) users... So I'd either suggest to do this in standard way via make install, or ignore creation of manpage. :-)

I'm not sure what exactly You mean by "how do I make sure that all package repositories with muscle are updated? Will your package cover all of them, or are there others I should be aware of?". Well... Usually humble packager like me just grabs source and if there is no problem, package is just created using standard well-known tools. No need to bother developer then. In best world, if there is need for any change, packager improves something or so, the changes are send back to upstream (developer), so they then get back into all packages without developer's extra work (packagers just fetch new version). This doesn't apply only to various Linux distributions, but also macOS Homebrew, conda and projects like that. As packagers simply copy Your source, You can't really verify if everything is correct and up-to-date, not without manual checking of various packaging projects. I'll surely cover openSUSE. Other RPM distributions can copy "my" spec file. I see Debian packagers are already doing something with v. 5 and I guess they either fix issues themselves (if any), or discuss with You. If this is the answer You mean...? I'm not sure if You can directly run something from/within OBS (I haven't tried anything like that), but it's possible to connect it as part of various pipelines, so I guess there would be way. But this would be for discussion for someone more knowledgeable than me. :-)

rcedgar commented 2 years ago

Here is the problem, I think -- there are several muscle packages out there, but the people who maintain them don't know about muscle v5. Presumably, they would typically be notified via a github release tag, but muscle v3 was not on github. Seems it is my job to find the packages and contact their maintainers somehow. Is there a list of repository databases like debian (I'm not sure of the correct words here)?

V-Z commented 2 years ago

As soon as it's open-source, its spread is necessarily out of Your control... Debian already knows about v.5 and from there it will be propagated into Ubuntu and all Linux distributions using DEB packages. I don't think You need to notify packagers. They'll find out. Either they follow news, or some user notes and asks packagers to update or so. And no, there is no exhaustive list of Linux repositories. Just look at number of distributions... Debian is the main and they already noted v. 5 in GitHub, so this is covered. It's open system, repositories can be anywhere, although there are several main large distributions. Simply, I wouldn't worry about that. I might take some time, but it'll happen.

V-Z commented 2 years ago

BTW, as this and also #24 and #27 should be fixed, what about publishing new release and close everything? :-)

V-Z commented 1 year ago

https://github.com/rcedgar/muscle/releases/tag/5.1.0 seems to fix everything, so I'm closing, if You mind. :-)