iqbal-lab-org / pandora

Pan-genome inference and genotyping with long noisy or short accurate reads
MIT License
109 stars 14 forks source link

Linking against musl statically #289

Open leoisl opened 2 years ago

leoisl commented 2 years ago

Hello,

I thought I was linking statically against musl when building the precompiled binary. Brice and me found out I was actually not. I was statically linking glibc, and this is not good for several reasons.

In this commit, where the main change is changing the base image of the container to alpine, we necessarily link against musl, but some dependencies are not straightforward to get working in alpine. In this case, the backward lib was a bit complicated to get compiled in alpine, and this just provides meaningful stacktraces in case of errors, which can help a little bit debugging. Usually debugging is a bit more involved, having to debug the execution step-by-step in an IDE with a MWE. What I mean is by producing a binary that is totally portable, linked statically against musl, we will lose the ability to produce meaning stacktraces in case of errors, but I don't think such stacktraces are very useful anyway. It is a cost that is ok to pay for me. I am just checking with you if it is also. If yes, I will create a PR and release a minor version with this small fix.

iqbal-lab commented 2 years ago

ok with me. If we want a stacktrace for a reproducible bug we can recompile, and not use the static binary, right?

leoisl commented 2 years ago

No... I am actually removing the code to produce the stacktrace in this commit. We would not have a stacktrace in case of errors for pandora anymore. If we have a bug and a MWE to reproduce the bug, the stacktrace is not useful anyway, as we can get much more information than just the stacktrace in a debug environment, in a step-by-step execution. This stacktrace is useful if we don't have a MWE, and the bug can just be reproduced by running pandora on a large, real-world input data, which could be too slow to run in debug mode. In this case, we would run in release mode and look at the stacktrace for clues to solve the bug.

So the stacktrace mostly just helps in this case. With a conditional compilation, we could indeed just remove this feature when compiling the static binary. When compiling in any other way, we could enable it back. This will require a little bit more work, should we do it?

leoisl commented 2 years ago

Ok, so getting back to this, I might need some of your experience with musl: @mbhall88 , @jeff-k .

A static binary linked against musl is the most portable one that we could get. Unfortunately, that is coming with an unacceptable slow down, and I don't understand why. pandora index is usually a command that takes a few minutes to run, but with a static binary linked against musl, I got to 55% of the PRGs indexed in ~14h, when I just killed the job, as sth is clearly broken. When linking against glibc, the same progress was achieved in just 12 mins, and the whole indexing was done in 23 mins.

So, definitely, a musl-linked binary is out of question, the performance cost is too heavy, even though statically linking to glibc is not the best. I saw some posts from the Rust community showing that linking against musl can makes things a lot slower:

... but also some posts saying that musl binaries are faster, e.g. https://users.rust-lang.org/t/optimizing-rust-binaries-observation-of-musl-versus-glibc-and-jemalloc-versus-system-alloc/8499

I think the best is to decide based on our own benchmarks, which shows musl binary with an unacceptable slowdown, so we will just provide a glibc binary. In case of incompatibility issues, we will still have container, conda and compile-from-source options... unless you know how to solve this musl binary slowdown or a third alternative?

mbhall88 commented 2 years ago

I've gotta say, my only experience with musl builds is just building portable rust binaries using cross. So I don't think that is at all useful here.

Have you looked into whether changing the allocator to jemalloc is feasible for pandora? This came up a few times when I was searching around.

If not, I think having a container, conda etc. should be sufficient. If people start requesting static builds, we can deal with it then.