zarr-developers / zarr_implementations

MIT License
38 stars 16 forks source link

jzarr read/write (close #36) #37

Closed joshmoore closed 3 years ago

joshmoore commented 3 years ago

The only real issue (repeated from #25) is that I don't see a way to return an array from the read_from_jzarr method which means that non-Python implementations may need to handle the verification code themselves internally.

cc: @constantinpape

joshmoore commented 3 years ago

Looking at one of these failures (reading js.zr/blosc/lz4/ with Zarr) the first few values differ:

Screen Shot 2021-05-04 at 21 37 17 Screen Shot 2021-05-04 at 21 37 26

Modifying |u1 back to jzarr's <u1 doesn't correct the issue.

cc: @SabineEmbacher

grlee77 commented 3 years ago

Looking at one of these failures (reading js.zr/blosc/lz4/ with Zarr) the first few values differ:

Yeah, the values seen there are what would be obtained when trying to convert uint8 values to int8 so that values >=128 overflow and become negative.

For example

import numpy as np
np.asarray([154, 147, 151, 109], dtype=np.int8)

gives

array([-102, -109, -105,  109], dtype=int8)
SabineEmbacher commented 3 years ago

JZarr is bound to the primitive data types of Java. There is no primitive unsigned byte data type in Java. Data types which are offered in the JZarr API (see com.bc.zarr.DataType) only serve to be written into the .zarray file so that the reader of the data knows how to interpret it. When the written data is read with Python, the interpretation happens automatically. If the data is read again with JZarr, the correct interpretation of the raw data is up to the user.

Here is an example of interpreting byte data as unsigned bytes. org.esa.snap.core.datamodel.ProductData.UByte

The same overflow effect will be observed with the u2 and u4 data types.

joshmoore commented 3 years ago

Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types

joshmoore commented 3 years ago

Failures now look to all be related to nesting:

9 failures ``` FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, blosc] 10268 FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, raw] 10269 FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, zlib] 10270 FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, blosc] 10271 FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, raw] 10272 FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, zlib] 10273 FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, blosc] 10274 FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, raw] 10275 FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, zlib] ```

which should be fixed once JZarr releases https://github.com/zarr-developers/zarr-python/pull/715

@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.

grlee77 commented 3 years ago

@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.

I am fine with waiting, but if you would rather merge sooner we could introduce an environment variable to enable skipping the known failures.

SabineEmbacher commented 3 years ago

Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types

Do you have a suggestion how I could write this in the documentation to avoid misinterpretation?

joshmoore commented 3 years ago

Hey @SabineEmbacher. I guess the question is if you see a way forward here. Or is this simply an incompatibility that will remain between zarr-python and jzarr?

joshmoore commented 3 years ago

Whew. 💚 @grlee77, turning this over to you. I imagine we can continue the discussion around signedness (https://github.com/zarr-developers/zarr_implementations/pull/37#issuecomment-832194552) elsewhere.