Open jmswaney opened 6 years ago
Hi Justin, the general approach to implementing a new compression codec is to sub-class the numcodecs.Codec class and implement the methods encode(), decode(), get_config(), from_config(), and also the codec_id attribute. Docs here: http://numcodecs.readthedocs.io/en/stable/abc.html
To use the codec with zarr you need to register it with a call to numcodecs.register_codec(cls). That just sets up the mapping from codec ID to codec class. Docs here: http://numcodecs.readthedocs.io/en/stable/registry.html
In terms of implementation, any of the existing codec classes is worth looking at as an example. If you need to interface with external C code then there's various options. The existing codecs like Zstd, LZ4 and Blosc use Cython but there's other ways to do it.
I don't know anything about JPEG encoding but very happy to learn more if you find it useful.
On Sat, 14 Apr 2018, 16:53 Justin Swaney, notifications@github.com wrote:
I've been using chunk compressed Zarr arrays for some neuroscience image processing tasks, and it's been great so far. However, JPEG2000 might perform better than lz4 or Zstd for my images. I'd like to use Zarr to handle the image chunking with a JPEG2000 compressor, but I'm not sure if this is possible. I realize that this feature isn't as general as numcodecs would want, but I'm mostly asking what the steps would be to see if I should even try.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/73, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qq_K5se-4OPYJf_6uBTsmYRqJVoqks5tohuJgaJpZM4TVHh0 .
Adding a JPEG2000 compression filter would be great. Know others use this compression for image data as well.
FWIW we made some changes described in this comment, which should make wrapping a compressor pretty simple. Feel free to ask questions if you need any help.
@joshmoore, @jakirkham, and I looked into this for a while today.
@sofroniewn described how pyramidal-image support (cf. https://github.com/zarr-developers/zarr-specs/issues/23) is implemented in napari:
I have a zarr pyramid on
s3://sofroniewn/image-data/camelyon16/
which came from https://camelyon16.grand-challenge.org/Data/ (there is a google drive with tiff if you poke around)
Each resolution-level is a sibling Dataset in a containing Zarr Group. Napari loads each resolution level as a Dask Array, and changes which resolution level it pulls chunks from based on the user's zoom level.
That process works pretty well today, and longer-term we'd like to clean up and factor that pyramiding code out of Napari (which could have a cleaner interface to it, in addition to benefitting from more general community support of pyramiding).
Napari's main pain point is that the Zarr pyramids are e.g. 60x larger on disk than the pyramidal TIFF files that they originated as. The Zarr pyramids use Zarr's default Blosc compressor codec (which is likely bad at compressing image data), while the original TIFFs likely use JPEG2000 (which is quite good), so we think adding a JEPG2000 codec to numcodecs, and having Napari use that, will solve Napari's main issue with its Zarr pyramids.
@jakirkham started prototying a JPEG2000 codec today; a nice thing is that the Codec
interface receives an ndarray as input (we originally thought it only received a BytesLike
, which would be hard to reconstruct image dimensions from, which JPEG2000 would need). One caveat is that filter
s can't be applied before the JPEG2000 codec, bc then the latter would actually just receive a BytesLike; raise
ing seems appropriate in this situation.
Otherwise, we just need a good python binding to a JPEG2000 codec. imageio, imagecodecs, and glymur were looked at. There were a mix of concerns about dependencies / installation hassle as well as API semantics (we need something shaped like Buffer ⇒ BytesLike
not PathLike ⇒ PathLike
).
Dependency concerns could be mitigated by adding a pip qualifier (e.g. pip install numcodecs[jpeg]
), and some light fork to expose in-memory access to the one of those projects could be undertaken, if necessary.
Imagecodecs includes a bytes<->numpy encoder and decoder for JPEG200 based on the OpenJPEG library. I think it should be relatively easy to take the Cython code out of imagecodecs (BSD licensed) and adapt it for numcodecs.
Thanks Christoph! 😄
Using that I wrote the following. This seems like what we would want for a first pass.
from numcodecs.abc import Codec
from numcodecs.compat import ensure_ndarray
from numcodecs.registry import register_codec
from imagecodecs import jpeg2k_encode, jpeg2k_decode
class JPEG2000(Codec):
codec_id = "JPEG2000"
def encode(self, buf):
return jpeg2k_encode(ensure_ndarray(buf))
def decode(self, buf):
return jpeg2k_decode(ensure_ndarray(buf))
register_codec(JPEG2000)
This works for encoding. However we have an issue on decoding. Maybe there's something I'm missing above? 🙂
---------------------------------------------------------------------------
Jpeg2kError Traceback (most recent call last)
<ipython-input-6-7d1c93c78b4f> in <module>
----> 1 c.decode(c.encode(a))
<ipython-input-1-01776c52a8bc> in decode(self, buf)
13
14 def decode(self, buf):
---> 15 return jpeg2k_decode(ensure_ndarray(buf))
16
17
imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()
Jpeg2kError: opj_read_header failed
I didn't try to reproduce this yet, but it looks like this simple roundtrip should work if the output of ensure_ndarray(buf)
can be cast to uint8_t[::1]
by Cython, which appears to be the case since otherwise the detection of the codecformat would likely fail. Please try passing the buf
bytes directly to jpeg2k_decode
and enable OpenJPEG error handling and warnings with verbose=3
. What is the shape and dtype of the input a
?
Thanks Christoph!
Yeah was wondering about that too. So had tried with and without ensure_ndarray
just in case, but got the same error. Either way the data provided to jpeg2k_decode
was something that could be cast to uint8_t[::1]
as it was just the output of jpeg2k_encode
.
Sure let me provide a clear MRE.
Sorry if I missed something, but how do we set the verbosity?
Here's an MRE showing what I'm seeing. Happy to play with this more (adding verbosity and such) as is helpful 🙂
In [1]: import numpy as np
In [2]: a = np.arange(6, dtype="u4").reshape(2, 3)
In [3]: a
Out[3]:
array([[0, 1, 2],
[3, 4, 5]], dtype=uint32)
In [4]: from imagecodecs import jpeg2k_encode, jpeg2k_decode
In [5]: b = jpeg2k_encode(a)
In [6]: b
Out[6]: bytearray(b'\x00\x00\x00\x0cjP \r\n\x87\n\x00\x00\x00\x14ftypjp2 \x00\x00\x00\x00jp2 \x00\x00\x00-jp2h\x00\x00\x00\x16ihdr\x00\x00\x00\x02\x00\x00\x00\x03\x00\x01\x1f\x07\x00\x00\x00\x00\x00\x0fcolr\x01\x00\x00\x00\x00\x00\x11\x00\x00\x00\x89jp2c\xffO\xffQ\x00)\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x1f\x01\x01\xffR\x00\x0c\x00\x00\x00\x01\x00\x00\x04\x04\x00\x01\xff\\\x00\x04@\x00\xffd\x00%\x00\x01Created by OpenJPEG version 2.3.1\xff\x90\x00\n\x00\x00\x00\x00\x00\x17\x00\x01\xff\x93\xc0\x00\x00\x00\xf8C\x0fwv\xff\xd9')
In [7]: len(b)
Out[7]: 214
In [8]: jpeg2k_decode(b)
---------------------------------------------------------------------------
Jpeg2kError Traceback (most recent call last)
<ipython-input-8-d3265f5af6b1> in <module>
----> 1 jpeg2k_decode(b)
imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()
Jpeg2kError: opj_read_header failed
I see: dtype=uint32
. While JPEG 2000 supports 32 and 64 bit integers (up to 38 bits), OpenJPEG doesn't. I obviously never fully tested these cases, only 8 and 16 bit. You can get the OpenJPEG warnings and errors as follows:
>>> b = jpeg2k_encode(a, verbose=3)
JPEG2K info: tile number 1 / 1
>>> jpeg2k_decode(b, verbose=3)
JPEG2K info: Start to read j2k main header (85).
imagecodecs._jpeg2k.Jpeg2kError: Invalid values for comp = 0 : prec=32 (should be between 1 and 38 according to the JPEG2000 norm. OpenJpeg only supports up to 31)
Exception ignored in: 'imagecodecs._jpeg2k.j2k_error_callback'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
imagecodecs._jpeg2k.Jpeg2kError: Invalid values for comp = 0 : prec=32 (should be between 1 and 38 according to the JPEG2000 norm. OpenJpeg only supports up to 31)
imagecodecs._jpeg2k.Jpeg2kError: Marker handler function failed to read the marker segment
Exception ignored in: 'imagecodecs._jpeg2k.j2k_error_callback'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
imagecodecs._jpeg2k.Jpeg2kError: Marker handler function failed to read the marker segment
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "imagecodecs\_jpeg2k.pyx", line 390, in imagecodecs._jpeg2k.jpeg2k_decode
imagecodecs._jpeg2k.Jpeg2kError: opj_read_header failed
Not sure why OpenJPEG doesn't throw an error in jpeg2k_encode
. Maybe OpenJPEG does create a valid JPEG 2000 stream, but can't decode it...
Another thought: since this issue is about efficiently compressing image data, you might want to have a look at JPEG-LS via the CharLS library. There's also JPEG-XR (used commonly in CZI files), which also handles float32, but the jxrlib
library is not so nice to work with. Imagecodes supports both, but I never benchmarked the codecs/implementations. None of these formats support 32 or 64 bit integers.
Ah ok. So this is bad usage on my part. Thanks for clarifying Christoph! Should there be an error if the user supplies an unsupported type or are there situations where this might work?
Great, thanks for the suggestions. Will check those out too. Yeah this is mostly about compression. Just trying to think what makes a reasonably generic/useful compressor here (given the different type support of these). Do you have any thoughts on this? 🙂
If this issue needs a champion, I think I can make a case for taking on this work (@jmswaney above was part of our lab). @jakirkham I'm not sure if you have a branch going that I should contribute to - if not, I'm fine with restarting. Regarding support for 32 and 64 bit integers, we also would only use 8 and 16 bits, so pragmatically, I'd vote for disallowing 32 and 64 bit integers early on in the encoding process and by trying to detect failures due to 32 and 64 bit integers in decoding and reporting them.
If that plan seems workable, I'll go ahead and start work towards the goal of a pull request.
My use case for JPEG2000 is grayscale 3D stacks of JPEG2000 planes and my plan was to JPEG2000-encode each of the planes separately (and for 4 and 5D, stack over the first dimensions and encode over the last 2 dimensions), but an alternate would be to interpret arrays with 3 axes as Y, X and color if the size of the last axis was 3 (RGB) and encode as a color image.
My gut tells me to avoid a heuristic that operates differently depending on the size of the last dimension and encode what might be a color image as three grayscale planes. This also has a side-effect of not requiring the LCMS library (see https://github.com/cgohlke/imagecodecs/tree/master/3rdparty/openjpeg in imagecodecs) which simplifies the build.
I'd appreciate any feedback.
Thanks for offering to help here Lee! 😄
Unfortunately I don't have an existing branch, but I think the code in comment ( https://github.com/zarr-developers/numcodecs/issues/73#issuecomment-592111560 ) should be a good starting point and likely pretty close to what we need here. So would see if you can get that to run and go from there. Please let us know if you have any questions 🙂
Looks like @d-v-b did some work on a JPEG codec ( https://github.com/d-v-b/zarr-jpeg ). Not sure if JPEG2000 is considered there as well
I think it's not. I started work on the codec, have had to pause it recently.
Does imagecodecs.numcodecs.register_codecs()
suffice now to cover needs here?
^ @d-v-b @LeeKamentsky @joshmoore
For my own purposes the codec registration api in numcodecs sufficed perfectly
@d-v-b : just to clarify, you mean numcodecs API worked for you, not imagecodecs, right?
Thinking through some of the recent conversations with @DennisHeimbigner, if we're going to lean on imagecodecs for JPEG2000 support, we may want to go about defining an ID for it in this repo a la https://github.com/zarr-developers/numcodecs/issues/278
cc: @cgohlke
Not a bad idea, but imagecodecs does already provide unambiguous numcodecs IDs for all the classes it registers - I would not suggest changing them (although adding aliases would be fine).
The current list I get in my installation:
['imagecodecs_aec',
'imagecodecs_avif',
'imagecodecs_bitorder',
'imagecodecs_bitshuffle',
'imagecodecs_blosc',
'imagecodecs_brotli',
'imagecodecs_bz2',
'imagecodecs_deflate',
'imagecodecs_delta',
'imagecodecs_float24',
'imagecodecs_floatpred',
'imagecodecs_gif',
'imagecodecs_jpeg',
'imagecodecs_jpeg2k',
'imagecodecs_jpegls',
'imagecodecs_jpegxr',
'imagecodecs_lerc',
'imagecodecs_ljpeg',
'imagecodecs_lz4',
'imagecodecs_lz4f',
'imagecodecs_lzf',
'imagecodecs_lzma',
'imagecodecs_lzw',
'imagecodecs_packbits',
'imagecodecs_png',
'imagecodecs_snappy',
'imagecodecs_tiff',
'imagecodecs_webp',
'imagecodecs_xor',
'imagecodecs_zfp',
'imagecodecs_zlib',
'imagecodecs_zopfli',
'imagecodecs_zstd']
@d-v-b : just to clarify, you mean numcodecs API worked for you, not imagecodecs, right?
Correct, I defined a jpeg compressor and registered it with the numcodecs register_codec
function.
I should add that there's complexity involved in compressing 3D+ data with 2D codecs. You will almost certainly want to generate a 2D tiled version of the ND data, and compress that, but this requires codec metadata that defines the ND -> 2D transformation. I have not implemented this to my satisfaction.
Hello all!
Based on the initial investigations of @cgohlke and @jakirkham on this thread along with some of our own by @muhanadz we have released, heavily inspired by the existing work from @d-v-b, a Zarr JPEG-2000 codec using imagecodecs and by extension OpenJPEG:
Any and all feedback welcome!
Similar to the discussion on d-v-b/zarr-jpeg#1, our primary motivation for the codec is the compression of interleaved RGB bright-field whole slide imaging data.
@martindurant what would one need to do add an entrypoint to use zarr-jpeg2k above?
An entrypoint needs to be registered roughly of the form:
[numcodecs.codecs]
jpeg2k = zarr_jpeg2k.zarr_jpeg2k:jpeg2k
I read further up the thread and deleted my comment...
I am a little confused. Why is there a different package for jpeg2k as a numcodecs codec, which calls imagecodecs, when imagecodecs already has one? All the codecs there can be registered with numcodecs by calling imagecodecs.numcodecs.register_codecs()
. We just need a PR there to add the entrypoints, I'm sure it would be accepted. Perhaps when the conversation above happened, imagecodecs had not yet progressed as far.
All the codecs there can be registered with numcodecs by calling imagecodecs.numcodecs.register_codecs(). We just need a PR there to add the entrypoints, I'm sure it would be accepted. Perhaps when the conversation above happened, imagecodecs had not yet progressed as far.
For the time being I decided to distribute the numcodecs entry points as a separate package: https://pypi.org/project/imagecodecs-numcodecs/#files.
That sounds reasonable, @cgohlke . Unfortunately, it doesn't have a conda package.
It looks like the work on integrating jpeg2000 was abandoned. Is there any progress on this I'm missing? This is the only numcodecs thread I found related to this work.
jpeg2000 is included in imagecodecs, which has numcodecs wrappers
Thanks. I guess I missed that in the docs. I'm trying to figure out how to use j2k as the compression scheme in a zarr file.
I believe so long as you have https://pypi.org/project/imagecodecs-numcodecs/ installed, "imagecodecs_jpeg2k" wll an available codec without further effort.
I've been using chunk compressed Zarr arrays for some neuroscience image processing tasks, and it's been great so far. However, JPEG2000 might perform better than lz4 or Zstd for my images. I'd like to use Zarr to handle the image chunking with a JPEG2000 compressor, but I'm not sure if this is possible. I realize that this feature isn't as general as numcodecs would want, but I'm mostly asking what the steps would be to see if I should even try.