higlass / clodius

Clodius is a tool for breaking up large data sets into smaller tiles that can subsequently be displayed using an appropriate viewer.
MIT License
39 stars 21 forks source link

Ability to support file-like objects in `clodius.tiles` #142

Open manzt opened 2 years ago

manzt commented 2 years ago

Tracking whether file-like objects could be supported by the different tiles implementations.

🟩 = yes, 🟨 = unknown, 🟥 = no

🟨 bam.py

Depends on pysam.AlignmentFile which allows a file-like object for the reference but not the index.

import pysam

# internal support for remote files http://www.htslib.org/doc/samtools.html#DESCRIPTION
# We could use this, but would need to cache instances (like tiles/clodius) to avoid
# re-initializing for each call to `tiles()`
pysam.AlignmentFile('http://localhost:8080/data.bam##idx##s3://path/to/data.bai')

with open('./data.bam') as f:
  pysam.AlignmentFile(f) # works

with open('./data.bam') as f, open('./data.bai') as idx:
  pysam.AlignmentFile(f, index_filename=idx) # error, can't pass file-like object for index!

with fsspec.open('http://localhost:8080/data.bam') as f:
  pysam.AlignmentFile(f) # not recognized as a file-like object

🟥 bed2ddb.py

Depends on sqllite3.connect which only accepts a string.

🟥 bedarcsdb.py

Depends on sqllite3.connect which only accepts a string.

🟥 beddb.py

Depends on sqllite3.connect which only accepts a string.

🟩 bedfile.py

Currently a no-op.

🟨 bigbed.py

Depends on tiles/bigwig.py.

🟨 bigwig.py

Depends on whether pybbi could accept file-like objects.

🟩 chromsizes.py

Just reading a csv using builtin open.

🟩 cooler.py

Depends on h5py which allows file-like objects

🟩 density.py

Depends on h5py which allows file-like objects

🟩/🟨 fasta.py

Depends on pyfaidx.Fasta supports only filenames. Some effort here that could be abstracted further.

🟥 geo.py

Depends on sqllite3.connect which only accepts a string.

🟩 hitile.py

Depends on h5py which allows file-like objects

🟥 imtiles.py

Depends on sqllite3.connect which only accepts a string.

🟩 mrmatrix.py

Depends on h5py which allows file-like objects

🟩 multivec.py

Depends on h5py which allows file-like objects

🟩 tabix.py

Uses builtin open and gzip.open which can be replaced with f.open and gzip.open(f) if filehandles are provided.

🟩 time_interval.py

Uses builtin open with can be replaced with fh.open if file-like object is provided.

manzt commented 2 years ago

context: https://github.com/manzt/hg/pull/6