pfnet / pfio

IO library to access various filesystems with unified API
https://pfio.readthedocs.io/
MIT License
52 stars 20 forks source link

HTTPCachedFS #316

Closed y1r closed 11 months ago

y1r commented 1 year ago

This PR adds HTTPCachedFS, a wrapper to hook read operations and cache the content with HTTPConnector. HTTPCachedFS wrapper hooks read operations, then try HTTPCache to retrieve its content. The underlying FS will not be called if the data is in the cache. Other operations like write and list will be transferred to the underlying FS.

In order to introduce such a feature, I added _canonical_name(path) to FS interface and implementations. _canonical_name(path) returns path with filesystem information to distinguish Local("abc")._canonical_name("somefile") and S3("abc")._canonical_name("somefile"). S3 and HDFS will encode its endpoint to the path to distinguish multiple endpoints (like AWS's S3 and some S3 compatible storage systems).

Users can enable HTTPCache transparently by following methods:

y1r commented 1 year ago

Thank you for your detailed review! I renamed normpath to _canonical_name, and it now returns url-like string such as scheme://endpoint/bucket/key_name. Also, zipfile is too broad name as reserved name, thus I renamed it to pfio-zipfs to avoid namespace pollution.