Closed joshmoore closed 3 years ago
Looking at one of these failures (reading js.zr/blosc/lz4/
with Zarr) the first few values differ:
Modifying |u1
back to jzarr's <u1
doesn't correct the issue.
cc: @SabineEmbacher
Looking at one of these failures (reading js.zr/blosc/lz4/ with Zarr) the first few values differ:
Yeah, the values seen there are what would be obtained when trying to convert uint8
values to int8
so that values >=128 overflow and become negative.
For example
import numpy as np
np.asarray([154, 147, 151, 109], dtype=np.int8)
gives
array([-102, -109, -105, 109], dtype=int8)
JZarr is bound to the primitive data types of Java. There is no primitive unsigned byte data type in Java. Data types which are offered in the JZarr API (see com.bc.zarr.DataType) only serve to be written into the .zarray file so that the reader of the data knows how to interpret it. When the written data is read with Python, the interpretation happens automatically. If the data is read again with JZarr, the correct interpretation of the raw data is up to the user.
Here is an example of interpreting byte data as unsigned bytes. org.esa.snap.core.datamodel.ProductData.UByte
The same overflow effect will be observed with the u2 and u4 data types.
Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types
Failures now look to all be related to nesting:
which should be fixed once JZarr releases https://github.com/zarr-developers/zarr-python/pull/715
@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.
@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.
I am fine with waiting, but if you would rather merge sooner we could introduce an environment variable to enable skipping the known failures.
Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types
Do you have a suggestion how I could write this in the documentation to avoid misinterpretation?
Hey @SabineEmbacher. I guess the question is if you see a way forward here. Or is this simply an incompatibility that will remain between zarr-python and jzarr?
Whew. 💚 @grlee77, turning this over to you. I imagine we can continue the discussion around signedness (https://github.com/zarr-developers/zarr_implementations/pull/37#issuecomment-832194552) elsewhere.
The only real issue (repeated from #25) is that I don't see a way to return an array from the
read_from_jzarr
method which means that non-Python implementations may need to handle the verification code themselves internally.cc: @constantinpape