PrimitiveBlocks copies RandomAccessible blocks to primitive arrays

tpietzsch commented 1 year ago

This PR adds functionality to extract blocks from RandomAccessible into flat primitive arrays. This will be useful for example to copy data to Tensor, to extract blocks for storing into N5, for interfacing CLIJ, for running small algorithm kernels directly on primitive arrays, etc...

The only public API introduced is the interface PrimitiveBlocks<T extends NativeType<T>>. The static method PrimitiveBlocks.of(...) creates a PrimitiveBlocks accessor for an arbitrary RandomAccessible source. The method PrimitiveBlocks.copy(long[] srcPos, Object dest, int[]size) is then used to copy blocks out of the source into flat primitive arrays. The idea is to provide an interface similar to System.arraycopy. Object dest is a primitive array of type corresponding to T.

PrimitiveBlocks.of(...) understands a lot of View constructions (that ultimately end in CellImg, ArrayImg, etc) and will try to create an optimized copier. For example, the following will work:

CellImg< UnsignedByteType, ? > cellImg3D;
RandomAccessible< FloatType > view = Converters.convert(
    Views.extendBorder(
        Views.hyperSlice(
            Views.zeroMin(
                Views.rotate( cellImg3D, 1, 0 )
            ),
            2, 80 )
    ),
    new RealFloatConverter<>(),
    new FloatType()
);

PrimitiveBlocks< FloatType > blocks = PrimitiveBlocks.of( view );

final float[] data = new float[ 40 * 50 ];
blocks.copy( new int[] { 10, 20 }, data, new int[] { 40, 50 } );

The idea of the optimized copier is: Instead of using RandomAccess that checks for every pixel whether it enters a new Cell, whether it is out-of-bounds, etc., all these checks are precomputed and then relevant data from each Cell is copied in one go. The speedup can be dramatic, in particular if the underlying source data is in a CellImg. Some benchmarks included, here is for example results of CopyBenchmarkViewPrimitiveBlocks

# JMH version: 1.35
# VM version: JDK 17.0.3, OpenJDK 64-Bit Server VM, 17.0.3+7-LTS
...

Benchmark                                                  (oob)  (permute)  Mode  Cnt   Score   Error  Units
CopyBenchmarkViewPrimitiveBlocks.benchmarkLoopBuilder       true       true  avgt    5  12,789 ± 0,285  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkLoopBuilder       true      false  avgt    5   9,682 ± 0,152  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkLoopBuilder      false       true  avgt    5  14,333 ± 0,099  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkLoopBuilder      false      false  avgt    5  12,721 ± 0,123  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkPrimitiveBlocks   true       true  avgt    5   0,541 ± 0,010  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkPrimitiveBlocks   true      false  avgt    5   0,315 ± 0,024  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkPrimitiveBlocks  false       true  avgt    5   0,570 ± 0,013  ms/op
CopyBenchmarkViewPrimitiveBlocks.benchmarkPrimitiveBlocks  false      false  avgt    5   0,322 ± 0,008  ms/op

If a source RandomAccessible cannot be understood, PrimitiveBlocks.of(...) will return a fall-back implementation (based on LoopBuilder). With the optional OnFallback argument of PrimitiveBlocks.of(...) it can be configured whether fall-back should be

silently accepted (ACCEPT),
a warning should be printed (WARN) -- the default,
or an IllegalArgumentException thrown (FAIL). The warning/exception message explains why the source RandomAccessible requires fall-back.

The only really un-supported case is if the pixel type T does not map one-to-one to a primitive type. For example, ComplexDoubleType or Unsigned4BitType are not supported. (at least not yet).

PrimitiveBlocks.copy is single-threaded, the idea being to parallelize over blocks instead of the copying within a block. PrimitiveBlocks is not thread-safe in general, but has a method threadSafe() to obtain a thread-safe instance (implemented using ThreadLocal copies). For example,

PrimitiveBlocks< FloatType > blocks = PrimitiveBlocks.of( view ).threadSafe();

can safely be used multi-threaded, for example in CellLoaders.

tpietzsch commented 1 year ago

@mkitti This doesn't work with BufferAccess yet, but it would be nice to get that to work, and should be feasible. Maybe we could discuss this at some point?

mkitti commented 1 year ago

The cade where hasArray is true should be easy to handle.

https://docs.oracle.com/javase/7/docs/api/java/nio/Buffer.html#hasArray()

https://docs.oracle.com/javase/7/docs/api/java/nio/Buffer.html#array()

mkitti commented 1 year ago

For ByteBuffer, you can use the bulk put and get methods.

https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#put(byte[])

imglib / imglib2

PrimitiveBlocks copies RandomAccessible blocks to primitive arrays #330