rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

medaka on macOS #228

Open jmtsuji opened 4 days ago

jmtsuji commented 4 days ago

Problem description

In addition to Linux, it would be great if we could get rotary working on macOS. So far, installs that seem to be failing are:

(Check if install is addressed:)

medaka install issues

Rotary currently needs to be installed on Mac using the osx-64 architecture subdir of conda (i.e., for x86_64 tools), because many of the bioinformatics tools we are using only have x86_64 installs. This is a global setting across an entire rotary run. When run on a Mac with a M-series processor (e.g., M1, M2, M3, which are ARM-based), the tools are translated using Rosetta so they can run on arm64.

Unfortunately, medaka v1 is no longer installing well this way on macOS. I think that medaka 1.8.0 (which is currently specified in our env file for medaka) used to work on macOS, but it seems that the tensorflow dependency is not translating properly... I get a Illegal instruction: 4 error when I try to load this package.

Medaka v2 was recently released, but support for osx-64 was dropped in favour of osx-arm64. This makes sense given that Mac's are no longer made with x86_64 architecture chips, but it means that medaka v2 is currently unavailable for use in the pipeline unless we hack it in somehow.

Other tools with install issues

I'll add these as replies to this post as I have the chance to keep testing rotary on macOS.

UPDATE (edit on Oct 8th, 2024): simplified this issue to just focus on the medaka install -- changed the issue title. (The rest of the issue description is unedited.)

Proposed solution

medaka

A few different solutions might be possible, ordered here by preference:

Possible caveats etc.

I suppose one other solution would be to ask snakemake dev's is there is a way to specify the conda subdir variable for each conda env that is created by the pipeline. We could then hard-code that the conda subdir for medaka on macOS must be osx-arm64. This is also a bit hacky.

LeeBergstrand commented 4 days ago

@jmtsuji You also create separate Conda environments for Medaka 1 and 2 (medaka.yaml and medaka2.yaml) and change the Conda environment used by the Medaka snakemake rules depending on the hardware specs using Python.

rule polish_contig_medaka:
    input:
        calls_to_draft_bam='{sample}/{step}/medaka/calls_to_draft.bam',
        calls_to_draft_bam_index='{sample}/{step}/medaka/calls_to_draft.bam.bai'
    output:
        contig_polished=temp('{sample}/{step}/medaka/results/{sample}_{contig}.hd5')
    conda:
        some_function_that_selects_what_medaka() # Returns medaka.yaml or medaka2.yaml, depending on the CPU architecture.

https://stackoverflow.com/questions/7491391/is-there-a-reliable-way-to-determine-the-system-cpu-architecture-using-python

For the medaka.yaml you would lock the tensorflow version at a version that still works with X86_64 OSX.

jmtsuji commented 20 hours ago

@LeeBergstrand Based on a conversation with the medaka maintainers, there is no plan to make a osx-64 install of medaka2. Here are some updated possible courses of action for us (with additional thoughts from the medaka issue):

@LeeBergstrand Any thoughts? I don't think macOS support is a huge priority at the moment (aside from being able to easily test rotary on laptops), but I think it is worth considering for the finished tool.

jmtsuji commented 19 hours ago

As shown in #229, I think that the first option above is not feasible in our development timeline:

- Transition from supporting osx-64 to osx-arm64 with rotary. This would require updating all of rotary's conda envs to be installable natively on osx-arm64. I have not evaluated each rotary environment yet, but based on a quick glance, it looks like several of the annotation tools (e.g., EggnNOG mapper, newer versions of DFAST) lack a osx-arm64 install, so adding osx-arm64 support for the whole pipeline would not be trivial.
LeeBergstrand commented 5 hours ago

Drop macOS support entirely. This would simplify maintenance and avoid issues with Mac architecture down the road, but it would also make rotary inaccessible to many individual users.

@jmtsuji Why is MacOS support important for Rotary, other than for development? I don't see the entire pipeline being able to feasibly run on a Mac given the RAM requirements. Most baseline macs people have (MacBook pros and iMacs) top out at 24 gigs of RAM which has to be special ordered (baseline is 8 or 16 gb). The some versions you can get 32 gigs (mac mini and 15 inch MacBook pro). Even the top end modern mac studios ($5500) top out at 64 gigs of ram.

I would support waiting until more of these tools have ARM support.