nipy / nibabel

Python package to access a cacophony of neuro-imaging file formats
http://nipy.org/nibabel/
Other
649 stars 258 forks source link

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

Closed pauldmccarthy closed 7 months ago

pauldmccarthy commented 7 months ago

Hi @effigies I hope you're well!

I had a report from a FSLeyes user who has used a MATLAB GIfTI library to generate some .gii files that were failing to load in FSLeyes. The GIfTI files that this library generates are ASCII-encoded, but without any newlines to separate elements along the first dimension, e.g.. instead of the conventional:

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179
        140.33973694 190.0491333  73.24776459
        157.3598938  196.97969055 83.65809631
  </Data>
</DataArray>

these files instead contain

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179 140.33973694 190.0491333 73.24776459 157.3598938 196.97969055 83.65809631</Data>
</DataArray>

Once again, the GIfTI specification is light on details as to the expected format of ASCII-encoded data arrays - all it says is (from Section 2.3.4.5 Encoding):

The Data element contains ASCII text with each value separated by whitespace

But every ASCII-encoded GIfTI file I've encountered has contained newlines to separate elements along the first dimension.

In any case, when using nibabel to load such a file (e.g. the ascii_flat_data.gii test file in this PR), the loaded data arrays are one dimensional, e.g.:

import nibabel as nib
img = nib.load('ascii_flat_data.gii')
print('Expected shape:', img.darrays[0].dims)
print('Actual shape:  ', img.darrays[0].data.shape)
>> Expected shape: [10, 3]
>> Actual shape:   (30,)

It turns out that, for non-ASCII-encoded data arrays, the nibabel.gifit.parse_gifti_fast.read_data_block function will ensure that the loaded array has the shape specified in the DataArray attributes. But this is not the case for ASCII-encoded arrays.

This PR contains a small patch which ensures that ASCII-encoded arrays are reshaped to the expected dimensionality.

There is a slight complexity in that one of the existing unit tests is expecting the result to be 1D when loading a file that contains an array of shape (1, 3).

I added the .squeeze() call in to preserve this behaviour, but I'm not sure whether a better option is to always return the shape specified in the DataArray attributes. I'm not sure how widely used ASCII-encoded GIFTIs, so am unsure of the potential impact of such a change. It's obviously unlikely that one would encounter a POINTSET or TRIANGLE array with a dimension of length 1, but it could occur with SHAPE or TIMESERIES arrays.

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 86.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.26%. Comparing base (27c2427) to head (5bcf012). Report is 6 commits behind head on master.

Files Patch % Lines
nibabel/gifti/parse_gifti_fast.py 86.66% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1298 +/- ## ======================================= Coverage 92.25% 92.26% ======================================= Files 99 99 Lines 12467 12457 -10 Branches 2563 2560 -3 ======================================= - Hits 11502 11493 -9 + Misses 642 641 -1 Partials 323 323 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

effigies commented 7 months ago

We need to consider the ArrayIndexingOrder. MATLAB seems likely to ravel from column-major, while numpy will default to row-major. We do consider it below:

https://github.com/nipy/nibabel/blob/27c24272661be944598eb93a9f278f7fa0fd0022/nibabel/gifti/parse_gifti_fast.py#L135-L137

I think if we uniformize the dtype at the top of the function, then we can safely get rid of the byteswap at the end and reuse the existing reshape.

byteorder = gifti_endian_codes.npcode[darray.endian]
dtype = data_type_codes.type[darray.datatype].newbyteorder(endian)

I'm also noticing that the memmap created here does not respect index order:

https://github.com/nipy/nibabel/blob/27c24272661be944598eb93a9f278f7fa0fd0022/nibabel/gifti/parse_gifti_fast.py#L97-L103

So perhaps we want to go ahead and determine the shape and order at the top as well:

shape = tuple(darray.dims)
order = array_index_order_codes.npcode[darray.ind_ord]

Then we can use these consistently.

effigies commented 7 months ago

As to shape, I think we probably want to respect the actual shape as specified in the XML, not squeeze. We should update tests if this is not the case currently.

effigies commented 7 months ago

Applied and tested my suggestions in https://github.com/pauldmccarthy/nibabel/pull/1

pauldmccarthy commented 7 months ago

@effigies Thanks! You evidently also noticed that the row/column ordering was not being applied as well :+1: