I would like to use pyfaidx on google cloud storage ie via gcsfs. I'm just working with uncompressed .fa for now, I think compressed may be a bit more complex.
The simplest fix would be to allow the user to specify a custom file opening function, in this case passing:
g = gcsfs.mapping.GCSFileSystem()
fa_fsmap = g.get_mapper(fa_path)
g.open(fa_fsmap)
pyfaidx.Fasta(fa_path, fasta_opener=...)
This is a bit awkward though, as the function couldn't operate on the filepath argument- it needs the mapper function from gcsfs.
I wondered if it's better to allow the user to pass in an open file handle directly, but I guess this makes working out the accompanying index file impossible. Unless this is also provided.
Given that then, would you consider delegating open to fsspec.open via https://github.com/intake/filesystem_spec? This would have the advantage of also supporting the bgzf opening in a better way than checking the file extension.
Hi,
I would like to use
pyfaidx
on google cloud storage ie viagcsfs
. I'm just working with uncompressed.fa
for now, I think compressed may be a bit more complex.The simplest fix would be to allow the user to specify a custom file opening function, in this case passing:
This is a bit awkward though, as the function couldn't operate on the filepath argument- it needs the mapper function from
gcsfs
.I wondered if it's better to allow the user to pass in an open file handle directly, but I guess this makes working out the accompanying index file impossible. Unless this is also provided.
Given that then, would you consider delegating
open
tofsspec.open
via https://github.com/intake/filesystem_spec? This would have the advantage of also supporting thebgzf
opening in a better way than checking the file extension.Happy to submit a PR with either solution.