Port shapeit5 to MacOS on Apple Silicon using SIMDE

pettyalex commented 8 months ago

There's demand for users to be able to run shapeit on Mac OS, with users who want that creating Github issues or asking on other discussion forums:

https://github.com/odelaneau/shapeit4/issues/15 https://github.com/odelaneau/shapeit4/issues/36 https://github.com/odelaneau/shapeit5/issues/22

Shapeit5 cannot run on modern (2020+) Macs even in a VM, because it uses AVX2. Apple's x86 support supports SSE1/2/3/4 but not AVX or AVX2: https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment.

This PR contains a minimal set of changes that will allow a native ARM build on modern Macs. It does this using https://github.com/simd-everywhere/simde, a header-only library that provides high-performance vector intrinsic translation between x86 and ARM. This would also allow Linux on ARM support to be added very easily. Because of AWS's very low pricing for ARM compute instances, ARM is already the cheapest way to run large computational loads on the cloud so Linux on ARM support would be helpful for anyone operating in AWS.

This PR also has some very small changes to support building on Apple clang. std::random_shuffle was removed in C++17, and will be removed in a future version of GCC. It's already gone in clang 15, which I used to develop and test this PR. As recommended, I've replaced it with std::shuffle: https://en.cppreference.com/w/cpp/algorithm/random_shuffle

srubinacci commented 3 months ago

Hi Alex, Thanks for your this, it's actually really great and we are definitively interested in merging it. I've tried a previous version myself and it works very well. Just waiting @odelaneau for merging with new updates. Also we need to be sure it works with the new make system @rwk-unil introduced recently.

If we approve this, we should add mac compilation in the github workflow.

pettyalex commented 3 months ago

I'd be glad to add a mac build workflow to github actions, and clean up the make target a bit. I was mostly just validating that this will work until I heard you were interested in it.

Right now, as is, it suffixes the binaries with _mac which is pretty silly.

pettyalex commented 3 months ago

OK, I went ahead and simplified the makefile for Mac build, and also tested it on aarch64-linux-gnu. I noticed a few things:

It doesn't look like libboost_serialization is actually used anywhere? I don't see it linked in any dynamic libs, and there's no includes into it.
Any objections to targeting x86-64-v3 instead of -mavx2 -mfma? The baseline requirement is the same, but the compiler is able to generate many more instructions where they may be useful: https://en.wikipedia.org/wiki/X86-64
Is there a reason why when linking it dynamically we need to explicitly link to all the dependencies of our dependencies, like -lbz2 -llzma -lcurl -lcrypto -ldeflate? Can't we only link -lboost_iostreams -lboost_program_options -lhts? If so, then the conditional for mac build support can be entirely eliminated.

Oh and for x86 this updates the -mtune to Skylake server, which should be a good baseline these days for older machines.

rwk-unil commented 3 months ago

@pettyalex thank you for your pull request and your work, @srubinacci @odelaneau I have reviewed the changes, here are my comments : 1) I like integrating SIMDE which will help port to other architectures 2) Thank you for replacing random_shuffle as it is removed in C++17 and this will break builds depending on compiler. I fully agree with this change, we have been discussing this with @srubinacci already. (also the big problem with random_shuffle is that the source of randomness is implementation defined, often with std::rand used (see ref), making reproducibility an issue, even with same seed) 3) As for the Makefile :

OK, I went ahead and simplified the makefile for Mac build, and also tested it on aarch64-linux-gnu. I noticed a few things:

It doesn't look like libboost_serialization is actually used anywhere? I don't see it linked in any dynamic libs, and there's no includes into it.

It may be an artefact left over from experimentation, these were introduced when the Makefiles were updated (see git blame e.g., makefile) and may not be necessary anymore.

Any objections to targeting x86-64-v3 instead of -mavx2 -mfma? The baseline requirement is the same, but the compiler is able to generate many more instructions where they may be useful: https://en.wikipedia.org/wiki/X86-64

I agree, AVX2/FMA requires Haswell (2013) or newer CPU anyways, so targeting x86-64-v3 would enable extra CPU features that are available anyways while simplifying the build command at the same time. I totally support this suggestion.

Is there a reason why when linking it dynamically we need to explicitly link to all the dependencies of our dependencies, like -lbz2 -llzma -lcurl -lcrypto -ldeflate? Can't we only link -lboost_iostreams -lboost_program_options -lhts? If so, then the conditional for mac build support can be entirely eliminated.

When I simplified the Makefile, I did the minimal amount of changes possible that would get SHAPEIT5 to build in an Ubuntu 20.04 GitHub action, IIRQ without the flags it would fail the linking the executable. But I agree the whole Makefile could benefit from further cleanup. If your fork allows to build without the flags on the following action https://github.com/odelaneau/shapeit5/blob/main/.github/workflows/build.yml (and you make the necessary changes so that the static build also still works) we can think of removing the flags (which would be nice because as you said we can drop the conditional entirely).

Oh and for x86 this updates the -mtune to Skylake server, which should be a good baseline these days for older machines.

Not sure we would want to include this, it is very specific, people wanting to really optimise their build can change this on their own, but I wouldn't add it as default.

Cheers. Rick

pettyalex commented 3 months ago

I have:

Reverted the -mtune change entirely.
Tested on both an M1 Mac and ubuntu 20.04 environment
Added a Github Actions workflow to test on Mac OS, but I haven't tested it. Could we enable workflow runs for this PR? The github action to configure boost doesn't support Mac OS, and I feel like compiling boost on every action run feels a little wasteful... so I used boost & htslib from brew. If you don't like this, I can just compile both of those.

I've also cleaned up the makefile a smidge to make it easier to build inside of places like conda that expect to be able to set CXXFLAGS to change the include path.

pettyalex commented 3 months ago

I believe this xcftools PR that fixes up its build on Apple Silicon would have to go through before that action will pass, though: https://github.com/odelaneau/xcftools/pull/7

Also, while I was testing packaging this in conda, I noticed that your build rules do not include LDFLAGS or CXXFLAGS. I changed CXXFLAG to CXXFLAGS in the makefile, so now it will pick up the standard environment variables and build more easily in more places. I can revert this if you'd like.

Finally, I enabled lto, link-time optimization. If you don't want this, I can get that out of this PR.

rwk-unil commented 3 months ago

I have:

Reverted the -mtune change entirely.

Tested on both an M1 Mac and ubuntu 20.04 environment

Added a Github Actions workflow to test on Mac OS, but I haven't tested it. Could we enable workflow runs for this PR? The github action to configure boost doesn't support Mac OS, and I feel like compiling boost on every action run feels a little wasteful... so I used boost & htslib from brew. If you don't like this, I can just compile both of those.

I've also cleaned up the makefile a smidge to make it easier to build inside of places like conda that expect to be able to set CXXFLAGS to change the include path.

This seems great, I think the most important is that the Linux (Ubuntu) build succeeds with the new Makefiles, for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

I believe this xcftools PR that fixes up its build on Apple Silicon would have to go through before that action will pass, though: odelaneau/xcftools#7

Also, while I was testing packaging this in conda, I noticed that your build rules do not include LDFLAGS or CXXFLAGS. I changed CXXFLAG to CXXFLAGS in the makefile, so now it will pick up the standard environment variables and build more easily in more places. I can revert this if you'd like.

Finally, I enabled lto, link-time optimization. If you don't want this, I can get that out of this PR.

This has been bothering me too for quite some time, I totally support this change 👍 .

One other thing, we should ditch -lpthread in the libraries and use -pthread (see stackoverflow)

Also, for linking we should use the standard variables LDLIBS or LOADLIBES.

If these standard variables are used, we could remove the rules for building the .o object files and the binaries by relying on the Make built-in rules, this would further simplify the Makefile (but in no way a necessity, and might make the Makefile less clear).

The default rule for .o from .cc/.cpp/.C is : $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c
The default rule for linking is : $(CC) $(LDFLAGS) $^ $(LOADLIBES) $(LDLIBS)

Reference : Make catalog of rules

Cheers.

pettyalex commented 3 months ago

The build is working on Linux in its current state, I've been testing it. The real blocker at this point to getting this merge ready is the xcftools PR: https://github.com/odelaneau/xcftools/pull/7

Once that's merged I can update the xcftools submodule to point at it.

I updated to -pthread instead.

for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

Given that they're free, github provided runners, is spending the extra resources to run CI on Mac a problem? If it takes longer to find/run on Mac it could be disabled in the future, but my observation has been that Github has plenty of mac runners.

rwk-unil commented 3 months ago

for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

Given that they're free, github provided runners, is spending the extra resources to run CI on Mac a problem? If it takes longer to find/run on Mac it could be disabled in the future, but my observation has been that Github has plenty of mac runners.

They are free up to a point, free tier GitHub provides up to 2,000 minutes a month (all hosted projects combined) and Mac OS X runners have a 10x multiplier on running minutes. That's why I suggest running it only on releases. (Also it is more environmentally friendly). At the end the decision for this should go to @odelaneau as the project is hosted on his account.

Reference : https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions

odelaneau / shapeit5

Port shapeit5 to MacOS on Apple Silicon using SIMDE #68