Closed oesteban closed 8 months ago
@effigies - would you agree that:
For 1, I think if we open memory-maping it should be sufficient as we don't access the data array, correct?
For 2, the moving image is loaded here:
and then this is the relevant loop where we should make sure only one volume is loaded on to memory at a time:
does it make sense?
1) You do not need to load a volume of the reference image at all, unless you're using a mask to narrow the field-of-view. All you need is the shape, affine and dtype.
2) If you switch to using dataobj
, then it doesn't matter whether you have a mmap or not. You can create an iterator that returns one volume at a time:
```python
(dataobj[:, :, :, i].astype(dtype, copy=False) for i in range(dataobj.shape[3]))
```
This could be made into a generator function if you need more flexibiilty.
3) Volume-by-volume definitely makes sense. Let the thread manager choose how many to load, and then set your threads based on available cores and memory.
Thanks SO MUCH :)
I think you have a serious inefficiency here: https://github.com/nipy/nitransforms/blob/1674e86a73595356eb6a775fd5b5c612952482a0/nitransforms/linear.py#L464-L478
You're creating a bunch of individual volumes, and then concatenating them in C order. There are going to be unnecessary copies under the hood, and another one when saving to NIfTI.
In fMRIPrep, I've moved to
# Order F ensures individual volumes are contiguous in memory
# Also matches NIfTI, making final save more efficient
out_array = np.zeros(coordinates.shape[1:] + data.shape[-1:], dtype=output_dtype, order='F')
tasks = [
asyncio.create_task(
worker(
partial(
resample_vol,
data=volume,
coordinates=coordinates,
pe_info=pe_info[volid],
hmc_xfm=hmc_xfms[volid] if hmc_xfms else None,
fmap_hz=fmap_hz,
output=out_array[..., volid],
order=order,
mode=mode,
cval=cval,
prefilter=prefilter,
),
semaphore,
)
)
for volid, volume in enumerate(np.rollaxis(data, -1, 0))
]
The equivalent here would be:
resampled = np.zeros(coordinates.shape[1:] + data.shape[-1:], dtype=output_dtype, order='F')
for t in range(data.shape[-1]):
ndi.map_coordinates(
data[..., t],
targets[t, ..., : _ref.ndim].T,
output=resampled[..., t],
order=order,
mode=mode,
cval=cval,
prefilter=prefilter,
)
Okay, I'm testing your suggestion, and before even getting there, we hit these lines:
These push RSS from ~10GB to ~30GB.
I don't fully understand that line. We don't use it in sdcflows/fmriprep. We would just use:
targets = nt.base.SpatialReference.factory(spatialimage).ndcoords.astype('f4').T
Ah, hold on. Nevermind, I see the equivalents. Need to think it through a bit.
I think that's somehow equivalent to _ref.ndcoords.T
of line 452.
Lines 453-455 converts those physical coordinates into voxel coordinates - seemingly in a very inefficient way
What's the dimensionality of your target image?
And here's the equivalent of what we have in fMRIPrep:
ycoords = self.map(_ref.ndcoords.astype('f4').T)
ras2vox = ~nt.Affine(spatialimage.affine)
targets = ras2vox.map(ycoords)
If switching to float32
and Affine.map
don't help, then you're just working with much bigger data. In which case it might make sense to start chunking arrays by slice as well as volume.
>>> _ref.shape
(76, 102, 60)
Not so big, although it's 750 timepoints, so
>>> targets.shape
(750, 465120, 3)
Timepoints shouldn't matter at this point. You're just calculating the spatial coordinates, not iinterpolating values. Even at float64
, targets
should only be 10MB, so something's going wildly wrong.
I would skip the index()
thing anyway. I don't think it buys you any clarity.
Okay, timepoints do matter because here:
targets = ras2vox.map(ycoords)
you have to ravel the two first dimensions of targets.
Looks like cramming all the coordinates in a single mapping is really not memory efficient.
Okay, that was the culprit (probably the inefficiency you pointed out was also blowing up stuff). Sending a PR soon.
The resampling of transform-mappings is typically killed in standard settings when files are large (we've seen it with DWIs of ~150 volumes).