Open dhirschfeld opened 3 years ago
TBF, I think fsspec
still isn't quite as mature as pyfilesystem2
and doesn't have quite as polished of an api, however it does seem to have much better support for the use-cases I care about.
xref: #7
I've been talking to @martindurant about fsspec
for a while now (he's the creator). My current preference is to not throw the baby out with the pyfilesystem bathwater, and instead include some kind of support for both pyfilesystem2
and fsspec
. Martin has actually been kind enough to get an implementation of fsspec
for jupyter-fs
started in his changes
branch here.
@dhirschfeld I don't have a huge quantity of bandwidth to work on jupyter-fs right now, and most of my effort is currently going towards the new tree-finder
based filebrowser. But if you want to take a crack at it I would not say no to a fsspec
PR
It's an itch I'd like to scratch, but realistically won't have time to look at any time soon.
I'm using fsspec
to access data on cloud storage from JupyterLab and I thought it would be nice to be able to browse that same storage from within JupyterLab to e.g. check if my f.write(data)
call really worked. There's a slight friction having to switch to the Azure Portal to check if the files that should have been written to cloud storage really were written.
Unfortunately, since it's a "nice-to-have" rather than a "can't live without" I won't be able to invest time into it in the medium term - I can't even keep up with my can't live without's :/
Is there any update on this?
If not, I would like to work on this issue, to use fsspec for protocols not supported by pyfilesystem2.
My current preference is to not throw the baby out with the pyfilesystem bathwater, and instead include some kind of support for both pyfilesystem2 and fsspec.
Based on the above comments, I am considering either of the following policies, but would appreciate comments if you have a preference.
Change the backend for each resource
from setting
as in the following example.
(For backward compatibility, use pyfilesystem2 if not set.)
{
"resources": [
{
"name": "explicit_pyfilesystem2_resource",
"url": "osfs:///Users/foo/test",
"backend": "pyfilesystem2"
},
{
"name": "implicit_pyfilesystem2_resource",
"url": "osfs:///Users/foo/test",
},
{
"name": "fsspec_resource",
"url": "s3://test",
"backend": "fsspec"
},
]
}
Check if the protocol is supported by pyfilesystem2, and if so, use pyfilesystem2. Otherwise, use fsspec. https://github.com/PyFilesystem/pyfilesystem2/blob/master/fs/opener/registry.py#L93
If there is no preference, I would like to proceed with 1 for future expansion. Any comments or suggestions would be appreciated.
Note that fsspec instances generally need more configuration. Whilst it is possible to set the default values for any particular protocol, it is very conceivable to want different configurations for, e.g., an owned bucket, a public bucket and a requestor-pays bucket on S3. (or even different S3-compatible service)
Thank you for your comment. I believe that the feature will be worthwhile even with default values at first, since it will also support protocols that are not yet supported by Pyfilesystem2. Therefore, I would like to proceed initially with default values, as is the current usage of Pyfilesystem. And what about more detailed configurations, which I would be willing to consider if necessary?
(Not related to the issue, but I also find fsspec useful on a daily basis. Thank you for developing a very cool and useful product)
I have started to implement the addition of fsspec.
Since fsspec.core.url_to_fs() is used internally to create instances, I began to think that making 'kwargs' configurable in addition to 'backend' would solve the problem you mentioned. (I would like to pass it like client_kwargs)
Of course, as an interface to JupyterLab's setting, this would be redundant. However, this is not a big problem because this function is only for users who want to do complicated things. (Basic users will still be able to use it with the same settings.)
Thanks @reoono , let me know if I can help.
Is the effort here related to https://github.com/fsspec/jupyter-fsspec
jupyter-fsspec is "inspired" by this repo, and is only in early stages so far. If you would like to port any functionality or otherwise help develop it, that would be cool.
I do still plan on moving to fsspec
eventually, there were some issues detailed here (as well as some others not written here) that were a problem, but they should be ok now.
Before
fsspec
existed I usedpyfilesystem2
and was very happy with it - it's a great library however it (apparently) didn't meet all the requirements fordask
sofsspec
was built, primarily to supportdask
, but it's also used inintake
and as a generic filesystem api. As such it has a robust community around it and is continually improving and maturing.Coming from the distributed computing world it has first-class support for cloud storage, and in particular (for my use-case) Azure Data Lake.
I haven't actually used the cloud storage plugins in
pyfilesystem2
but they don't seem to have a lot of development momentum behind them, unlikefsspec
.To better support cloud filesystems I think it would be great if
jupyter-fs
could make use offsspec
rather thanpyfilesystem2