Closed dblodgett-usgs closed 1 month ago
Found the code in question.
https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/http.py#L40
See WIP for this here: https://github.com/dblodgett-usgs/pizzarr/pull/1/files
How do you feel about use of memoise as I have in there?
one_hour <- 60^2
# per-session http get cache
mem_get <- memoise::memoise(\(client, path) client$get(path),
~memoise::timeout(one_hour))
I added vcr
testing for crul interactions. Seems the most straight ahead for mocking http tests right now. Any issue with how I set that up?
(simultaneous comments FTW! -- happy monday)
Wondering what your preferred implementation is
My first thought would be to add an optional property to the Store abstract class for keeping consolidated metadata at the store-level. If the consolidated metadata is present, then it can be used within the listdir
function implementation.
it appears that listing an http store only typically works when there is consolidated metadata. Should we follow the same pattern?
Yes, I think it is reasonable to say that listdir support is only guaranteed if consolidated metadata is present. I actually did not even realize that Zarr would ever look in index.html
directory listings. (In most of the S3 buckets that I use day-to-day the directory listing is turned off)
Haha, happy Monday! Thanks for sharing the implementation.
How do you feel about use of memoise as I have in there?
It makes sense, especially for HTTP store! I think we should add (and document) a mechanism to clear the cache.
I also wonder if the cache should be a property on the store instance instead of a global variable. For more flexibility, the constructor could offer options to specify a timeout, the option to out-out of caching, and/or a custom cache instance to pass to the cache
parameter of memoise
+1 on the cache control on the store. I'm thinking of the global in-memory cache as kind of like a web browser cache but I can see how it would be nice to associate it with the store object and not all that hard -- I've been banging my head on caching in one of my other package. Hopefully I can offer some hard lessons learned here.
+1 on requiring consolidated metadata.
Out of curiosity, is the "multiresolutions" convention you have in your demos something there is a specification for? the xarray zarr zonvention doesn't have any metadata other than fully consolidated metadata that gets you from group to arrays.
OK -- I just moved the memoise function under the store object and added a cache timeout control.
Once #81 has merged, I'll queue that up to this repository.
I'm still learning the ins and outs of the zarr spec, and still don't totally follow the nuances of consolidated metadata, but was playing with this. Wondering what your preferred implementation is @keller-mark ?
User story
I want to be able to
listdir()
with an http store.Preferred solution
Looking here: https://github.com/zarr-developers/zarr-python/issues/993 and doing a bit of background, it appears that listing an http store only typically works when there is consolidated metadata. Should we follow the same pattern?
Looking at the python implementation:
And if I set up a little local file server that includes an index.html, with:
I can do this and the httpstore hs isdir TRUE! Not sure we should be reading an html index and doing what fsspec does here -- that is a lot of complexity.