imglib / imglyb

Connecting Java/ImgLib2 + Python/NumPy
https://pypi.org/project/imglyb/
BSD 2-Clause "Simplified" License
31 stars 5 forks source link

Do not flip dimension order #6

Closed ctrueden closed 3 years ago

ctrueden commented 4 years ago

When wrapping a numpy array to an ImgLib2 image, the dimension order is reversed. For example, a 3D numpy array dimensioned [3, 5, 7] would become a 3D RandomAccessibleInterval dimensioned [7, 5, 3]. This PR attempts to fix that inconsistency such that dimensions stay consistent across the two worlds.

A numpy array has two different orders:

C is column-contiguous order in memory (last index varies the fastest). C order means that operating row-rise on the array will be slightly quicker. FORTRAN-contiguous order in memory (first index varies the fastest). F order means that column-wise operations will be faster.

ImgLib2 generally prefers F order. In particular, the Views.flatIterable method returns an IterableInterval in F order, and the IntervalIndexer methods perform F-ordered rasterization.

On the Python side: if you do not specify an order, then numpy arrays default to C order.

@hanslovsky Is this discrepancy between ImgLib2's general preference and NumPy's default preference the reason for inverting the axes? Or is there another reason?

Since NumPy supports both C and F, my strong vote is to discontinue this dimension flipping behavior, in favor of consistently keeping everything the same.

@hanslovsky If you agree, I can polish up this PR. Certainly, I would add unit tests to prove that all works as expected in the various cases.

Alternately, if breaking existing behavior makes you nervous, we could add a flip_dimensions flag to the to_imglib method, defaulting to True. I would still change pyimagej to pass False for this flag though, because I think it's good to reduce the number of things our less technical users need to know about and work around.

imagejan commented 4 years ago

I'm not sure. It seems to be a strong convention to have zyx order in Python (for performance reasons I believe), and many examples are based on this. See also https://scikit-image.org/docs/stable/user_guide/numpy_images.html#coordinate-conventions

Interestingly, the channel dimension comes last in these examples.

hanslovsky commented 4 years ago

@ctrueden The memory layout is independent of the order of indexing. For a two dimensional image, (matrix), ndarrays are accessed in the order (y, x) (first index is row) and imglib accesses with indices (x, y) (first index is column). This is independent of the underlying memory layout, i.e.

np.empty((3,4), order='F').shape == np.empty((3,4), order='F').shape

The same should hold true when wrapping as ImgLib2 RAI (pseudo-code):

arr1 = np.empty((3,4), order='F')
arr2 = np.empty((3,4), order='C')
imglyb.to_imglib(arr1).dimensions == imglyb.to_imglib(arr2).dimensions

By default, ndarrays are created in row-major order, e.g. copied from help(np.zeros):

    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.

which means that consecutive elements of a row in a matrix are contiguous in memory. Elements within a row are accessed by the column index (x-index), which is the second index in NumPy but the first index in ImgLib2. With the same memory layout, the fastest moving index will be the same in ndarray and ImgLib2 ArrayImg and shape/dimensions need to be reverted when converting from numpy to ImgLib2 and vice versa.

To summarize: The shape/dimension needs to be reverted when wrapping ndarray into ImgLib2, and vice versa, because the indices are reversed (x, y, z, ...) vs (..., z, y, x). This is independent of the underlying memory layout (column major vs row major), which defines the fastest moving index (x or z). For row major order, the first index in ImgLib2 ArrayImg would be fastest moving but it would be the last index in a ndarray. Both these indices describe the same thing, though. This indexing discrepancy probably stems from the fact that numpy thinks of ndarrays as matrices (first index is the row), while ImgLib2 thinks of images/RAIs as images (first index is the column).

NB: Users can use np.ascontiguousarray to optimize for row major order.