Closed martindurant closed 2 years ago
Thanks for reaching out @martindurant! Does fsspec also handle movement of data? If so, it might be interesting exploring an integration with the Skyplane Python API (not yet merged into main, but hopefully coming soon).
Also we check our Slack more frequently so it'd be great to have you join there!
fsspec does have methods to copy files within a given backend (e.g., s3->s3) and between backends (e.g., s3->gcs), but the latter pipes everything through the client process. By far more common a use for fsspec is to read data for processing and writing different data back.
As you may know, fsspec is a pure python library package for accessing various file storage backends in a unified filesystem-like manner, so that downstream users and libraries can seamlessly work with data on any of those backend. Its focus is on providing filesystem related utilities like ls, glob and creating duck-typed file-like objects. Your package here has a more focussed concern, but with overlap. I have not tried to time the bulk copy (
cp(recursive=True)
) methods of s3, gcs and azure, but I assume that the throughput via fsspec is similar to the official CLI tools or worse.I am not clear that there is an obvious way for the two projects to cooperate, but I wanted to say "hi". I have not yet tried your tool, but the stated benchmarks are very promising.