Closed victorlin closed 2 months ago
This is presumably the same for our conda runtime, given that it installs augur?
Since these dependencies are only used in a small part of Augur, I'd like to think of them as optional dependencies that can be installed independently.
Yes and no. Augur needs a tree builder for most workflows 😉 But there are many tools that are workflow specific, and this list will grow. At some point we need to move towards workflows being able to specify these rather than including them within augur
(or conda-base
). (Something to keep in mind, not necessarily solve right now.)
This seems reasonable to me given our current directions with runtimes and project packaging. Note that it will be a breaking change for anyone used to installing Augur with Conda and finding that they now need to also install some other stuff they never had to before.
Note also that we didn't originally define these deps in the Conda packaging of Augur: they were in the original third-party packaging.
Note that it will be a breaking change for anyone used to installing Augur with Conda and finding that they now need to also install some other stuff they never had to before.
This can be done tandem with an Augur release and mentioned in the changelog. My thinking is that it would be a "note" to add after usual changelog entries without effect on semver. Example:
Note: Starting with this version, installing Augur from Conda will not automatically install
fasttree
,iqtree
,mafft
,raxml
, andvcftools
. Install them separately if you would like to use subcommands that are dependent on them. This increases flexibility for installation of Augur and the other software tools. (nextstrain/bioconda-recipes#3)
I'd also do this after https://github.com/nextstrain/docs.nextstrain.org/pull/157 is merged and create another PR to update the FAQ section there.
This is presumably the same for our conda runtime, given that it installs augur?
If you're talking about the inability to use ARM64, yes. An inherent problem with Conda is that, given my limited trials, it doesn't play well when you try to install packages from different architectures into the same environment. This is because most packages (such as Augur currently) define dependencies, and dependency resolution is limited to one architecture. This contrasts with the Docker image that we build, where we have the option to build/install packages individually for the native architecture.
Because of this, the Conda runtime uses an Intel-based Miniconda installation for all Macs regardless of processor architecture (src).
Anaconda is rife with issues for many Python projects. We don't recommend using Anaconda. This is in our documentation.
@sgoggins I think there's some confusion here – we're maintainers of nextstrain/augur and it looks like you're working on a tool of the same name.
at least one dependency is not available for ARM64
Which dependencies do not have ARM64 support? Is that information published somewhere explicitly? I can't tell from the recipe meta YAML files for these packages alone...
@huddlej you can tell from each package's Bioconda page. For example, MAFFT:
I just looked and none of fasttree
, iqtree
, mafft
, raxml
, or vcftools
are Conda-installable for ARM64. This might be due to https://github.com/bioconda/bioconda-recipes/issues/23454.
Thanks, @victorlin! I noticed the badges but never saw one with ARM64 specified. That Bioconda issue explains why! :D
Since all of Bioconda doesn't support ARM64, that makes this outcome less desirable for users:
In other words, one would install them separately, not necessarily through Conda, if they want to use those features.
As @jameshadfield pointed out, the features users would be missing include an aligner and tree builder which are pretty key to most Augur workflows. I would really prefer to not have to install/upgrade IQ-TREE, mafft, etc. manually and outside of my Conda environments.
Because of this, the Conda runtime uses an Intel-based Miniconda installation for all Macs regardless of processor architecture.
Instead of removing the architecture-specific dependencies from the Bioconda recipe, could we instead recommend the same approach as above of using the Intel-based Miniconda on Mac regardless of actual architecture? This is how I have Augur installed "ambiently" now. Even though this approach slows down workflows because of emulation, the other non-Docker alternative is the managed Conda environment which will also be equally slow because of emulation.
It also seems like Bioconda will eventually support ARM64, so we could eventually also recommend using the ARM64-based Miniconda in the future...
@huddlej I see your point. I was thinking of conda install
as serving similar purposes to pip install
when the average Conda user probably thinks otherwise.
In other words, the average user might expect pip install
to manage Python dependencies (as it does), while they might expect conda install
to manage ~all dependencies (which Augur's Bioconda recipe currently does). With this perspective, I'd opt to keep things as-is.
That said, I think the deeper problem is that the average Apple silicon user is unaware of the emulation subtleties (because macOS hides them very well), meaning they aren't aware of potentially faster options. I think the best we can do here is:
Hmm. I'm confused. Maybe I'm missing something?
I would really prefer to not have to install/upgrade IQ-TREE, mafft, etc. manually and outside of my Conda environments.
Why would this change force you to install them outside of your Conda environments? It wouldn't prevent you from doing what you do now (namely, using "Intel-based Miniconda on Mac regardless of actual architecture").
It would force you to manually install/upgrade them separately from Augur itself.
Yeah, I see how that was confusing; I'm currently a Linux, Mac OS X Intel, and ARM64 user of Augur, so here's how I'm thinking about the experience:
If I'm a new/current Linux or Mac OS X Intel user, I would have to manually install/upgrade Augur and its compiled dependencies separately.
If I'm a new/current Mac OS X ARM64 user, I would have to manually install/upgrade Augur in a Conda environment and then manually install/upgrade the dependencies outside of the Conda environment with Homebrew, etc.
For all architectures, this setup means I need to know that the additional compiled dependencies have to be installed separately and I need to know what those dependencies are and which versions to install. When I install Augur from Conda, I don't necessarily know that it can't/won't install the required compiled dependencies for me unless I'm following the docs or I try to run augur align
or augur tree
and it crashes.
For the Linux and Max OS X Intel users, the compiled dependencies would at least still be in a Conda environment. The new setup would be a downgrade of the current user experience where I can just install Augur on both of those systems and all dependencies are managed for me.
For the ARM64 users, their dependency management is much more complicated and loses the benefits of self-managed Conda environments. If we propose in the docs that ARM64 users install Augur with ARM64 Conda, they would not think they have an option to avoid this complexity.
The experience for the ARM64 user right now is not good at all, since they can't install Augur with Conda at all and don't get any explanation from Conda about why not. The proposed solution of removing the compiled dependencies from the Bioconda Augur recipe marginally improves the situation by allowing them to install an incomplete installation of Augur. This comes at the cost of a poorer user experience for Linux and Mac OS X Intel users (everyone gets an incomplete installation of Augur).
It seems like there are two other alternate installation paths for ARM64 users that don't degrade the experience for non-ARM64 users:
When Bioconda finally supports ARM64 builds, the problem for ARM64 users goes away and Linux and OS X Intel users don't notice the difference.
Ok, I think I see your point now: Removing MAFFT, IQ-TREE, etc. deps from the Conda packaging results in a degraded experience for anyone not in an arm64 Conda env, and that isn't made worth it by the gained ability to get an incomplete (and thus not super useful) Augur install in an arm64 Conda env.
That makes sense to me.
It does make me think that instead of making the Conda package like the Python package and removing these deps, we could actually make the Python package more like the Conda package and bundle fasttree, iqtree, mafft, raxml, or vcftools into the wheels.
Theoretically, the conda
/mamba
CLIs are fully capable of solving with mixed channel subdirectories (e.g., architectures). Whether macOS will seamlessly handle the interoperability in the resulting environment is another question. Unfortunately, my M1 is packed for moving, so I can't test this right now, but maybe one of you might want to try something like:
mamba create -n foo --override-channels -c conda-forge -c conda-forge/osx-64 -c bioconda/osx-64 -c -c bioconda/noarch augur
This assumes:
channel_priority: flexible
If the system base is osx-64 (but still on Apple Silicon), the meaning of straight conda-forge
is different, and so would need something like:
mamba create -n foo --override-channels -c conda-forge/osx-arm64 -c conda-forge -c bioconda augur
Note that conda-forge
without a subdirectory always implicitly pulls in the noarch subdirectory.
Probably better would be to simply spell out everything explicitly in a YAML:
augur-arm64.yaml
channels:
- conda-forge/osx-arm64 # prefer native
- conda-forge/noarch # or noarch
- conda-forge/osx-64 # otherwise, emulate
- bioconda/noarch
- bioconda/osx-64
- nodefaults # equivalent of "--override-channels"
dependencies:
- augur
mamba env create -n foo -f augur-arm64.yaml
I've never really played with this, so suggest it only for experimentation.
@mfansler interesting idea! I just tried it on my M1.
The environment creation was successful, and I can see that packages were pulled from a mix of the arm64
and osx-64
channels.
However, upon running augur --help
, there was an unsurprising architecture compatibility error:
Original error was: dlopen(/opt/homebrew/Caskroom/miniconda/base/envs/tmp/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so, 0x0002): Library not loaded: '@rpath/libgfortran.5.dylib'
Referenced from: '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/lib/libopenblasp-r0.3.23.dylib'
Reason: tried: '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/lib/libgfortran.5.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))), '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/lib/python3.10/site-packages/numpy/core/../../../../libgfortran.5.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))), '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/lib/python3.10/site-packages/numpy/core/../../../../libgfortran.5.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))), '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/bin/../lib/libgfortran.5.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))), '/opt/homebrew/Caskroom/miniconda/base/envs/tmp/bin/../lib/libgfortran.5.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))), '/usr/local/lib/libgfortran.5.dylib' (no such file), '/usr/lib/libgfortran.5.dylib' (no such file)
From the detailed logs above, the obvious reason is that numpy
was installed from conda-forge/osx-64
while Augur is native. In an attempt to install numpy
from conda-forge/osx-arm64
instead since it is available, I tried (1) setting the channel_priority
config to strict
and (2) explicitly specifying conda-forge/osx-arm64::numpy
in the YAML, but none of those seemed to work. 😕
Even if this were to work, it might be unnecessary hassle for users who are just trying to mamba install augur
.
Thank you for the suggestion though, the capability of conda to solving with mixed archs is good to know!
Thanks for reporting back the results of your testing! Yes, I wasn't too hopeful and suspected that while Rosetta could handle launching new processes in emulation, it might not cover dynamic library callouts. Particularly problematic would be when two packages of different architectures both link against something like BLAS. The one would have references for the osx-arm64 BLAS and the other for osx-64 BLAS. Since Conda can't install both versions, I'd guess this could result in missing symbols references. Anyway, I suppose it was worth the try!
I do find the numpy
result odd - the solver should have prioritized osx-arm64 and only fallback to osx-64 only when otherwise unavailable. Channel priority of "flexible" should be sufficient.
ARM builds are coming, first linux aarch64, but maybe also soon M1: https://github.com/bioconda/bioconda-docs/pull/16
What I do is a little hacky but it works well for me:
bioconda-64
, e.g. iqtreePATH
manually in my .zshrc
, as a last option/fallback. So these binaries are always available anywhere, but still conda environments. This is not how one is supposed to use conda environments (i.e. with activate) but it works very well. In fact, I had entirely forgotten I had set things up like this half a year ago.Augur is now natively installable on osx-arm64
, largely thanks to @corneliusroemer's work in https://github.com/nextstrain/conda-base/issues/77. I just tested locally:
Note that mafft
is pulled from conda-forge beacuse the package on Bioconda is not available for osx-arm64
.
Nice!
Just a minor note: I see the environment installs OpenBLAS by default. From what I've benchmarked, it significantly underperforms Accelerate on osx-arm64, so I would encourage to try including a 'blas=*=accelerate'
.
@mfansler thanks for sharing, that is useful info. We don't use BLAS directly in Augur – it is the numpy dependency that could potentially use it. I think the proper solution is for numpy to declare that dependency on osx-arm64
. There are a few relevant issues:
but it seems that maintainers are reluctant to using Accelerate as default due to compatibility issues with SciPy.
Background
Augur's Bioconda recipe page shows
noarch
, presumably because it is a pure-Python package that can be run natively on any architecture. In practice, this can be useful for running computationally expensive pure-Python Augur subcommands such asaugur filter
.However, installing from Bioconda in an ARM64 context is currently impossible (I thought there might be a chance with
--no-deps
but it didn't work for me). The reason is because the recipe defines the dependencies explicitly, and at least one dependency is not available for ARM64 and the singleconda install
command can only search resolve packages under a single architecture. This forces all Augur subcommands to use emulation when only certain dependencies require emulation.Possible solutions
1. Remove dependencies from Augur's Bioconda recipe
Since these dependencies are only used in a small part of Augur, I'd like to think of them as optional dependencies that can be installed independently. In other words, one would install them separately, not necessarily through Conda, if they want to use those features. This seems to be what the current Augur-specific installation page implies.
This is also how it's done for building the Docker image (1, 2) and anyone that is using Augur via
pip install
.Proposed changes:
2. Bundle required dependencies into Augur's wheels
See https://github.com/nextstrain/bioconda-recipes/issues/3#issuecomment-1546143203.
3. Wait for all dependencies to be available for ARM64
See https://github.com/nextstrain/bioconda-recipes/issues/3#issuecomment-1544768125.