odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
56 stars 9 forks source link

Docker image - Static binaries shouldn't need host libraries #67

Open pettyalex opened 8 months ago

pettyalex commented 8 months ago

I noticed that the Docker image seems to include the libraries necessary to compile shapeit5, but does not compile them inside the docker container. Instead, static binaries from outside the container get copied in.

Because you are copying in already-compiled static binaries, I believe that the docker image could be dramatically simplified and much smaller, by including only the static binaries? I could send a PR if you're interested and test it myself.

It looks like you also want bcftools in the docker container as well, because you're using those at the same step of your process as you want the shapeit5 tools? If you want to include those, the generally accepted practice would be a multi-stage build, which I'd also be glad to help with. That lets the docker image that you ship not include compilers or development libraries, which helps it be smaller and more secure.

You could also use a distro-provided bcftools, but it seems like you want to pin to htslib and bcftools version 1.15 right now, so the newer versions in Debian or Ubuntu may not be appropriate.

npdungca commented 7 months ago

Hi, @pettyalex.

Were you able to run an analysis using the Docker image? May I please know the command used? I tried this: sudo docker run lindonkambule/shapeit5_2023-05-05_d6ce1e2:latest --help and got this: docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "--help": executable file not found in $PATH: unknown. ERRO[0000] error waiting for container: Which executable file is needed? Sorry, I'm new to this and still have so much to learn. Thank you in advance for your help.

kscott-1 commented 7 months ago

+1 to this.

The static binaries should be compiled inside a controlled docker environment, as opposed to copied in from a local environment. The final images can be stripped of the build time dependencies, and only packaged with the necessary binaries. Ideally, this comes in the form of one binary per image, where the entrypoint can be specified as that binary.

The compiled binaries are very small, so the resulting images can be kept under 100M.

I have a build set up in the UM org and I'd be happy to share that as well.

neurotensin commented 3 months ago

@kscott-1 that would be extremely helpful! I fail to see how the "Dockerfile" could possible build the application cleanly - it required a great deal of modification to get mostly done. In addition for downstream development, we want to make sure we don't introduce regressions, therefore a full container build is essential.