Closed mkitti closed 10 months ago
Turns out the unit tests are not quite thorough enough :-/
@mkitti could you please merge https://github.com/saalfeldlab/n5-zarr/commits/zstandard/ into your branch?
I'm also running into an issue when using zarr-python to read data written by n5-zarr
Write the data:
final String root = "...";
final N5Writer zarr = new N5Factory().openWriter( root );
final String dset = "simple-zst";
ArrayImg<UnsignedByteType, ByteArray> img = ArrayImgs.unsignedBytes(new byte[]{0,1,2,3,4,5,6,7,8,9,10,11}, 12);
N5Utils.save(img, zarr, dset, new int[]{12}, new ZstandardCompression());
Read the data:
import zarr
root = zarr.open('zstd-test.zarr')
arr = root['n5-test/simple-zst']
arr[:]
Where does N5Factory().openWriter( root )
come from? I don't see that method in N5Utils?
This seems to be a bug in zarr-developers/numcodecs. There they use the C function ZSTD_getDecompressedSize
:
According to the Zstandard manual, that routine is now deprecated. One issue with it is that it returns 0
if the result is empty, unknown, or if an error has occurred.
https://facebook.github.io/zstd/zstd_manual.html
The numcodecs bug is that they assume that a value of 0
means error. In this case, it actually means unknown. I know it means unknown since I used the function ZSTD_getFrameContentSize
and that returns 0xffffffffffffffff
or ZSTD_CONTENTSIZE_UNKNOWN
.
Here's what my current test class looks like:
package org.janelia.saalfeldlab.n5.zarr;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import org.janelia.saalfeldlab.n5.N5Writer;
import org.janelia.saalfeldlab.n5.imglib2.N5Utils;
import org.janelia.scicomp.n5.zstandard.ZstandardCompression;
import org.junit.Test;
import com.github.luben.zstd.Zstd;
import net.imglib2.img.array.ArrayImg;
import net.imglib2.img.array.ArrayImgs;
import net.imglib2.img.basictypeaccess.array.ByteArray;
import net.imglib2.type.numeric.integer.UnsignedByteType;
public class ZstandardTest {
@Test
public void testZstandard() throws IOException {
final String root = "/home/mkitti/eclipse-workspace/n5-zarr/test.zarr";
final N5Writer zarr = new N5ZarrWriter(root);
final String dset = "simple-zst";
final byte[] bytes = new byte[1024*1024];
for(int i=0; i < bytes.length; ++i) {
bytes[i] = (byte)(i*5-128);
}
//bytes = new byte[]{0,1,2,3,4,5,6,7,8,9,10,11};
ArrayImg<UnsignedByteType, ByteArray> img = ArrayImgs.unsignedBytes(bytes, bytes.length);
ZstandardCompression compressor = new ZstandardCompression();
compressor.setSetCloseFrameOnFlush(true);
N5Utils.save(img, zarr, dset, new int[]{1024}, compressor);
byte[] compressedBytes = Files.readAllBytes(Paths.get(root, dset, "0"));
System.out.println(Zstd.getFrameContentSize(compressedBytes));
}
}
Basically the problem is that at the time the Zstandard frame header is written it does not seem to know the size of the input. Thus it marks it as unknown. numcodecs does not know what to do with an unknown size.
Rather than using the stream API we may need to a buffer API.
To address the issue rather specifically, we may need to use setPledgedSrcSize. https://www.javadoc.io/doc/com.github.luben/zstd-jni/latest/com/github/luben/zstd/ZstdCompressCtx.html
thanks for investigating @mkitti
This PR to n5-zstandard fixes the issue for me.
N5Factory().openWriter( root )
comes from n5-universe.. The tests I was running involved adding Zstandard compression to the list of options in the imagej export plugin in https://github.com/saalfeldlab/n5-ij
@bogovicj I updated n5-zstandard to version 1.0.2
Add Zstandard dependency