The N5 API specifies the primitive operations needed to store large chunked n-dimensional tensors, and arbitrary meta-data in a hierarchy of groups similar to HDF5.
Other than HDF5, N5 is not bound to a specific backend. This repository includes a simple file-system backend. There are also an HDF5 backend, a Zarr backend, a Google Cloud backend, and an AWS-S3 backend.
At this time, N5 supports:
Chunked datasets can be sparse, i.e. empty chunks do not need to be stored.
version 4.0.0
N5 group is not a single file but simply a directory on the file system. Meta-data is stored as a JSON file per each group/ directory. Tensor datasets can be chunked and chunks are stored as individual files. This enables parallel reading and writing on a cluster.
attributes.json
in a directory contains arbitrary attributes. A group without attributes may not have an attributes.json
file.A dataset is a group with the mandatory attributes:
Custom compression schemes with arbitrary parameters can be added using compression annotations, e.g. N5 Blosc and N5 ZStandard.
0/4/1/7
for chunk grid position p=(0, 4, 1, 7)).Chunks are stored in the following binary format:
Example:
A 3-dimensional uint16
datablock of 1×2×3 pixels with raw compression storing the values (1,2,3,4,5,6) starts with:
00000000: 00 00 .. # 0 (default mode)
00000002: 00 03 .. # 3 (number of dimensions)
00000004: 00 00 00 01 .... # 1 (dimensions)
00000008: 00 00 00 02 .... # 2
0000000c: 00 00 00 03 .... # 3
followed by data stored as raw or compressed big endian values. For raw:
00000010: 00 01 .. # 1
00000012: 00 02 .. # 2
00000014: 00 03 .. # 3
00000016: 00 04 .. # 4
00000018: 00 05 .. # 5
0000001a: 00 06 .. # 6
for bzip2 compression:
00000010: 42 5a 68 39 BZh9
00000014: 31 41 59 26 1AY&
00000018: 53 59 02 3e SY.>
0000001c: 0d d2 00 00 ....
00000020: 00 40 00 7f .@..
00000024: 00 20 00 31 . .1
00000028: 0c 01 0d 31 ...1
0000002c: a8 73 94 33 .s.3
00000030: 7c 5d c9 14 |]..
00000034: e1 42 40 08 .B@.
00000038: f8 37 48 .7H
for gzip compression:
00000010: 1f 8b 08 00 ....
00000014: 00 00 00 00 ....
00000018: 00 00 63 60 ..c`
0000001c: 64 60 62 60 d`b`
00000020: 66 60 61 60 f`a`
00000024: 65 60 03 00 e`..
00000028: aa ea 6d bf ..m.
0000002c: 0c 00 00 00 ....
for xz compression:
00000010: fd 37 7a 58 .7zX
00000014: 5a 00 00 04 Z...
00000018: e6 d6 b4 46 ...F
0000001c: 02 00 21 01 ..!.
00000020: 16 00 00 00 ....
00000024: 74 2f e5 a3 t/..
00000028: 01 00 0b 00 ....
0000002c: 01 00 02 00 ....
00000030: 03 00 04 00 ....
00000034: 05 00 06 00 ....
00000038: 0d 03 09 ca ....
0000003c: 34 ec 15 a7 4...
00000040: 00 01 24 0c ..$.
00000044: a6 18 d8 d8 ....
00000048: 1f b6 f3 7d ...}
0000004c: 01 00 00 00 ....
00000050: 00 04 59 5a ..YZ
Custom compression schemes can be implemented using the annotation discovery mechanism of SciJava. Implement the BlockReader
and BlockWriter
interfaces for the compression scheme and create a parameter class implementing the Compression
interface that is annotated with the CompressionType
and CompressionParameter
annotations. Typically, all this can happen in a single class such as in GzipCompression
.
HDF5 is a great format that provides a wealth of conveniences that I do not want to miss. It's inefficiency for parallel writing, however, limit its applicability for handling of very large n-dimensional data.
N5 uses the native filesystem of the target platform and JSON files to specify basic and custom meta-data as attributes. It aims at preserving the convenience of HDF5 where possible but doesn't try too hard to be a full replacement. Please do not take this project too seriously, we will see where it will get us and report back when more data is available.