mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Add fsspec support #190

Closed ap-- closed 2 years ago

ap-- commented 2 years ago

Hi @mdshw5,

I saw that there already was an attempt at implementing fsspec support and that the PR is basically stale.

So here's my attempt at implementing fsspec support: This closes #165 and replaces #168

I explicitly tried to keep the required changes to a minimum, to simplify review. This follows @manzt recommendations and accepts fsspec.core.OpenFile objects. This can still be done cleaner by refactoring more of Faidx.__init__

Tests all pass and I added a single fsspec test. Let me know what the best tests would be for covering the base functionality of this library to test fsspec support.

Cheers, Andreas :smiley:

mdshw5 commented 2 years ago

Hey @ap-- thanks for the thoughtful contribution! I do have a few small questions listed in the review above. This does seem like a straightforward implementation and I appreciate your consideration of the existing code structure. @manzt can you comment on whether this PR addresses your requirements?

ap-- commented 2 years ago

Hi @mdshw5,

I currently can't see your review comments. Did you submit it?

Cheers, Andreas

mdshw5 commented 2 years ago

Hi @mdshw5,

I currently can't see your review comments. Did you submit it?

Cheers, Andreas

Ah yeah - I forgot to submit 😅

mdshw5 commented 2 years ago

@ap-- I've merged these changes and plan to release a new v0.6.5 package asap. One thing I forgot to request: can you please provide an example in the Readme.rst demonstrating usage of fsspec with this package?

manzt commented 2 years ago

Oh oops, didn't see this got merged while I looked it over. Nice work, and thanks!

mdshw5 commented 2 years ago

@manzt I do like the idea of adding an index_loc kwarg to both Faidx and Fasta. By default it would be set to None, and this can signify that the index should be located adjacent to the FASTA file URI. Otherwise the value will be a file-like object. This would solve other use cases, like shared read-only FASTA files on university HPC systems, or even passing a temporary file or in-memory file like object.

ap-- commented 2 years ago

Thanks for merging ❤️ Should I open another PR with the README changes, and is there still interest in adding more fsspec tests? Or should I wait for #192 to be implemented?

Cheers, Andreas

mdshw5 commented 2 years ago

Don't hold your breath on #192 😄. It's more complicated than I thought. If you want to take a stab at it be my guest. Otherwise, some documentation for this PR in the README will suffice, and then I'll tag a new release.