sethaxen / MRCFile.jl

Read and write files and manipulate data in the MRC2014 format
Other
16 stars 4 forks source link

Ensure we handle handedness correctly #10

Open sethaxen opened 3 years ago

sethaxen commented 3 years ago

From the MRC2014 spec:

The handedness of the data block is not well defined by the MRC2014 standard. Conventionally, many pieces of software have treated the data as right-handed, with the origin in the bottom left corner of a 2D image and the Z-axis pointing out of the screen. However, this approach is not universal, and some packages treat the data block as left-handed. An example is FEI's EPU data acquisition software, which places the image origin in the top left, as documented in appendix C of the EPU User Manual. Proposals for indicating the data handedness in the file header are under discussion, but for now, the only way to be sure of the handedness is to check the behaviour of each software package individually.

It would be bad if we got this wrong. Probably the best thing we can do is check that what we do is consistent with the mrcfile package, which is standard.

shuuul commented 1 year ago

Current implementation is not consistent with mrcfile. In mrcfile, the shape of data should be (nz, ny, nx). https://github.com/ccpem/mrcfile/blob/a0573b0ded494e4338839277570dc924c2ee512a/mrcfile/utils.py#L90

but in MRCFile.jl, the shape is (nx, ny, nz).

sethaxen commented 1 year ago

I probably will not be able to look into this again for several weeks, but I think this is because in an MRC file data is written row-major, while Julia represents data as column-major, which for multidimensional arrays reverses the order of dimensions. Perhaps we can use a PermutedDimsArray after reading the data and before writing the data for consistency.

shuuul commented 1 year ago

It seems that there is no elegant way to handle it.

It has been discussed in HDF5.jl,

https://github.com/JuliaIO/HDF5.jl/blob/master/docs/src/index.md#language-interoperability-with-row--and-column-major-order-arrays https://github.com/JuliaIO/HDF5.jl/issues/785

sethaxen commented 1 year ago

It has been discussed in HDF5.jl,

Yeah, I've come across it there and with NetCDF as well. The main difference here is that in HDF5 and NetCDF the dimensions can be anything so there is no sensible default ordering, whereas in MRC they are clearly defined (from https://www.ccpem.ac.uk/mrc_format/mrc2014.php):

NX or NC | number of columns in 3D data array (fast axis) NY or NR | number of rows in 3D data array (medium axis) NZ or NS | number of sections in 3D data array (slow axis)

In fact, IIUC, the speed assessment of the axes in the spec seems to assume interfaces use row-major ordering. So unless there are any interfaces that provide access to MRC data with a different ordering than mrcfile's, it might make sense to always return a permuted view.

shuuul commented 1 year ago

My concern is that if we permuted the data array, then read_mmap is not meaningful, but it seems we can create a view https://docs.julialang.org/en/v1/base/arrays/#Base.view.

sethaxen commented 1 year ago

I don't think it would be unmeaningful. Just like reading data from an MRC file, we would wrap the mmapped array in a PermutedDimsArray before passing it to MRCData.

But before doing anything, it's worth surveying the major packages that load MRC files and see what they do.

shuuul commented 1 year ago

I see. PermutedDimsArray also creates a view for the original AbstractArray.