jbusecke / esgf-virtual-zarr-data-access

ESGF working group to enable data access via virtual zarrs.
Apache License 2.0
5 stars 1 forks source link

Testing time decoding #10

Open jbusecke opened 3 weeks ago

jbusecke commented 3 weeks ago

I am testing https://github.com/zarr-developers/VirtualiZarr/pull/122 on my CMIP example.

As part of this CI, I have set up a pytest module to create and test virtual datasets.

My current test for decoded time does indicate the the data seems to be decoded, but there are different attributes compared to a dataset that was loaded straight from http and then concatenated.

>       xr.testing.assert_identical(clean_time(ds), clean_time(ds_combined))
E       AssertionError: Left and right DataArray objects are not identical
E
E       Differing coordinates:
E       L * time     (time) datetime64[ns] 960B 2015-01-16T12:00:00 ... 2024-12-16T12...
E           Differing variable attributes:
E               chunksizes: [1]
E               fletcher32: False
E               shuffle: False
E               preferred_chunks: {'time': 1}
E               source: <File-like object HTTPFileSystem, http://aims3.llnl.gov/thredds/f...
E               original_shape: [60]
E               dtype: float64
E       R * time     (time) datetime64[ns] 960B 2015-01-16T12:00:00 ... 2024-12-16T12...
E       Attributes only on the left object:
E           original_shape: [60]
E           chunksizes: [1]
E           fletcher32: False
E           source: <File-like object HTTPFileSystem, http://aims3.llnl.gov/thredds/f...
E           shuffle: False
E           preferred_chunks: {'time': 1}
E           dtype: float64

tests/test_script.py:76: AssertionError

L in this case is the dataset loaded from the virtualizarr json, R is the 'ground truth'

I wonder if this is relevant or not in this usecase?

jsignell commented 3 weeks ago

Yeah so if I have my left and right correct it looks like there are some extra attributes that are sneaking in when using virtualizarr as compared to ground truth. I think those are xarray defaults that get added on load and if you had any of those attrs defined on the original they would be preserved. It would be good to test that though.