Closed jbusecke closed 3 months ago
Wondering if we should add some end to end tests with small files on every supported filesystem?
The fsspec.open feels very inconsistent. Ryan went on a super deep dive looking into behavior across https/local etc.
The
fsspec.open
feels very inconsistent. Ryan went on a super deep dive looking into behavior across https/local etc.
God how annoying. Literally what is the point of fsspec if it doesn't actually specify a common behaviour across different filesystems? It really all is a symptom of this https://github.com/fsspec/filesystem_spec/issues/1446.
What can we do about it in virtualizarr though? We could try to test all the filesystems we might care about (again fsspec should really be doing that...) We could maintain our own little open
context manager that actually has consistent behaviour. Or perhaps we could use some other learnings from pangeo-forge?
I suspect that there is still something going wrong with the default keywords logic, as I raised back https://github.com/zarr-developers/VirtualiZarr/pull/126#issuecomment-2146059544?
The problem is in virtualizarr.kerchunk.read_kerchunk_references_from_file
: https://github.com/zarr-developers/VirtualiZarr/blob/87221ea769ddc3a2971b118f8dace9159690a0cb/virtualizarr/kerchunk.py#L56-L58
That's defining reader_options
with s3-specific values.
It's technically API-breaking, but I'd recommend just setting that to Optional[dict[str, Any]] = None
. I don't know whether virtualizarr has a sufficiently better opinion for the default arguments that the backend (s3fs / adlfs / etc), but probably not.
Oh thanks @TomAugspurger ! I thought I fixed that bug in a commit I added to #126 but I obviously missed another occurrence of the default s3-specific values.
It would still be great to have tests that this all works on the main types of storage we care about.
It would still be great to have tests that this all works on the main types of storage we care about.
I like this idea, but maybe I should take a backseat, since I have the tendency to always write massive end-to-end tests 😆.
Continuation of https://github.com/zarr-developers/VirtualiZarr/pull/126
Over at https://github.com/jbusecke/esgf-virtual-zarr-data-access I am maintaining a simple example how to virtualize CMIP6 datasets.
This is a minimal example to reproduce the bug:
this works as intended, but if I leave out the
reader_options={}
it failsI suspect that there is still something going wrong with the default keywords logic, as I raised back here?
Wondering if we should add some end to end tests with small files on every supported filesystem?