scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
72 stars 17 forks source link

Support backup_url kwarg #15

Closed Zethson closed 2 months ago

Zethson commented 2 years ago

Is your feature request related to a problem? Please describe. From what I could see it is not possible to supply a backup_url for mudata objects via muon.read (since IO is primarily here I am opening the issue here) like scanpy's read function allows you to.

Describe the solution you'd like Support for it :) I guess that the download code already exists here: https://github.com/PMBio/mudatasets/blob/main/mudatasets/core.py#L28

ivirshup commented 2 years ago

An alternative solution I've been thinking about would be adding fsspec support for the input paths (https://github.com/theislab/anndata/issues/657)

This would look like:

data = muon.read_h5mu("filecache::https://ebi.ac.uk/...")

Where the location to cache is some configured cache directory, or controlled with kwargs passed to fsspec, like filecache={'cache_storage':'/tmp/files'}

Zethson commented 1 year ago

@gtca what would you preferred, simple solution be? Having the download code in MuData instead of MuDatadatasets? I might be able to file a PR, but it would be good if you could tell me first what you'd want.

gtca commented 2 months ago

v0.3 supports this via fsspec, namely this should work:

from mudata import read

# OpenFile and BufferedReader from fsspec are supported for remote storage, e.g.:
mdata = read(fsspec.open("s3://bucket/file.h5mu")))

# or
with fsspec.open("s3://bucket/file.h5mu") as f:
    mdata = read(f)

# or
with fsspec.open("https://server/file.h5ad") as f:
    adata = read(f)

For backup_url, if you think it's still needed, I guess it's worth a new issue on https://github.com/scverse/muon.