nextstrain / docker-base

Docker image build for nextstrain/base
https://hub.docker.com/r/nextstrain/base/
12 stars 6 forks source link

Build image based on conda-base? #222

Open victorlin opened 3 months ago

victorlin commented 3 months ago

Initially proposed by @corneliusroemer on Slack.

now that both bioconda and conda-forge support not just osx-arm64 but also linux aarch64, we could stop maintaining docker-base and simply build docker images based off micromamba docker base and install the conda-base environment into it. One source of truth, less maintenance!

Tasks

victorlin commented 3 months ago

Copying over my response from Slack:

I would consider this. The docker-base image still requires emulation for many programs mostly because the process to cross-compile successfully is kinda painful to figure out and varies for each program. If these programs are already precompiled for multiple platforms in conda, it would be nice to leverage that work.

joverlee521 commented 3 months ago

Seems reasonable as long as we continue to support tools that are not readily available via conda (mainly thinking of fauna, additional context in conda-base)

corneliusroemer commented 3 months ago

Seems reasonable as long as we continue to support tools that are not readily available via conda (mainly thinking of fauna, https://github.com/nextstrain/conda-base/issues/3)

@joverlee521 the simple solution is to just add everything to conda - like fauna. Any reason this is not possible?

Packaging things into bioconda/conda-forge has a clear advantage of making things also more easily available to the whole community.

huddlej commented 3 months ago

+1 for one source of truth, but after trying unsuccessfully to get a TreeKnit Bioconda package built, I'm skeptical that everything we need in the future will be Conda-able.

If we do decide to prioritize a single source of truth and the ability to always have a Conda version of every package we need, then we need to enforce stricter guidelines about the tools we can support.

The Julia/TreeKnit issue is an obvious one. Another issue would be how Bioconda didn't support ARM64 for several years while we were able to create Docker images with ARM64 support through custom builds quite quickly.

victorlin commented 3 months ago

Actually, we had considered this a bit in https://github.com/nextstrain/docker-base/issues/127.

@huddlej said:

If we are considering installation from prebuilt binaries, we might also consider installing these tools with Conda. We already rely on Conda binaries in our workflow-specific environment files and our nextstrain-base environment. We could have micromamba installed in our first pass of the Docker build and use that to install the third-party binaries we want.

and @tsibley said:

Conda packages bring along other issues. For example, they expect to bring along everything but libc, so things like openssl and other common shared libs will get duplicated (increasing image size, increasing complexity of library interactions at runtime, and more). I'm reluctant to mix Conda packages with non-Conda packages for these reasons.

That said, we might take a step back and consider building the container image entirely from a static Conda environment. We've (or at least I've) considered this before, but decided it wasn't worth it then. Maybe that's changed, particularly in light of our new Conda runtime defined by a locked package? There are downsides though, like a tighter coupling between runtimes and what they can support (e.g. architectures). Tighter is good in some ways but worse in others. Also, other considerations aside, we may not want to put all our eggs in Conda's basket.

The new development is that most(?) tools we provide in the runtimes are now available as linux-aarch64 on Bioconda.

corneliusroemer commented 3 months ago

I didn't realize how out of date our pins are compared to conda-base (which is usually using latest versions).

There's still this open PR from 15 months ago: https://github.com/nextstrain/docker-base/pull/145

The main argument I see for not updating is that there's no need to, and that there's a risk associated with it.

Downside is that one can't just use latest features of the tools we package, one needs to look at old versions of their docs. And if one wants to use newer features, like e.g. cmaple iqtree, one needs to make explicit PRs for it, like here #226

Maybe we could make a conda-base based docker image to allow test-driving in a few workflows to see what our experience is.

victorlin commented 3 months ago

@corneliusroemer re: pins, this is a good point for discussion which I've started a separate issue for: #227

Maybe we could make a conda-base based docker image to allow test-driving in a few workflows to see what our experience is.

For test-driving latest versions of tools, it might be easier to remove the pins in the existing Dockerfile rather than rewriting it to use conda-base.