Casting memmaps in ArrayProxies is slower than loading into memory first (optimization opportunity)

clane9 commented 1 week ago

It seems that reading uncompressed nifti volumes takes significantly longer when memory mapping is enabled. In this example, I load four large 4D nifti volumes from the ABCD dataset, with and without memory mapping.

import time

import nibabel as nib
import numpy as np

print(nib.__version__)

def benchmark_nibabel_load(path: str, mmap: bool):
    tic = time.monotonic()
    img = nib.load(path, mmap=mmap)
    data = np.asarray(img.get_fdata())
    rt = time.monotonic() - tic
    print(f"mmap: {mmap}, run time: {rt:.3f}s")

paths = [
    "/ABCD/sub-NDARINV0CCVJ39W/ses-2YearFollowUpYArm1/func/sub-NDARINV0CCVJ39W_ses-2YearFollowUpYArm1_task-rest_run-04_bold.nii",
    "/ABCD/sub-NDARINV0YE7L9KU/ses-2YearFollowUpYArm1/func/sub-NDARINV0YE7L9KU_ses-2YearFollowUpYArm1_task-rest_run-04_bold.nii",
    "/ABCD/sub-NDARINV0F82C6R8/ses-2YearFollowUpYArm1/func/sub-NDARINV0F82C6R8_ses-2YearFollowUpYArm1_task-rest_run-02_bold.nii",
    "/ABCD/sub-NDARINV0V80916L/ses-baselineYear1Arm1/func/sub-NDARINV0V80916L_ses-baselineYear1Arm1_task-rest_run-04_bold.nii",
]

mmaps = [True, False, True, False]

for path, mmap in zip(paths, mmaps):
    benchmark_nibabel_load(path, mmap=mmap)

This is what I get, using nibabel v5.2.1:

mmap: True, run time: 90.764s
mmap: False, run time: 3.595s
mmap: True, run time: 37.405s
mmap: False, run time: 4.810s

Any idea why this might happen?

effigies commented 1 week ago

.get_fdata() casts to a floating point dtype, float64 by default. You may prefer just np.asanyarray(img.dataobj), which will ensure that scaling is applied but preserve the memmap if no scaling is necessary.

If you know the dtype you want to operate in, np.int16img.dataobj) (for example) would work. If you're using targeting float32, img.get_fdata(dtype='float32') will work and cache the result for the next call.

clane9 commented 4 days ago

Thanks @effigies, I think these are good strategies for interacting with the memmap data. But still I'm confused why the simple get_fdata() is so much slower when mmap=True (the default)?

Regardless, I will just set mmap=False for my case and avoid the issue, so it is not too critical to get to the bottom of.

effigies commented 4 days ago

I don't really know why the memmap is slower. I would probably profile it and see where it's spending the time.

clane9 commented 3 days ago

Ya I did a quick profiling, with mmap=True, it's spending almost all of the time here, doing casting it seems.

With mmap=False, the casting takes less time and most of the time is spent doing the read

I guess when the array is a memmap, astype() does an implicit read, which is slower for some reason than the explicit read when mmap=False.

effigies commented 3 days ago

I'm not sure there's much to be done here. That said, if you're up for it, you could try looking into and benchmarking other approaches to scaling. e.g., pre-allocating an array with np.empty(dtype=target_dtype) and assigning its values with np.copyto.

That will move more of the branching logic (e.g., are dtypes the same) into nibabel, and it has been nice to let numpy make those decisions and trust them to be efficient. It could be worth upstreaming the solution (if we find one) into numpy's memmap.astype() implementation.

clane9 commented 3 days ago

Ya I agree, I feel like there's not much worth doing at this point. Especially since in this case mmap=False is fine. Just wanted to make a note since it did confuse us for a little while before we tracked it down. Will let you know if we end up feeling motivated and decide to look for some improvement.

nipy / nibabel

Casting memmaps in ArrayProxies is slower than loading into memory first (optimization opportunity) #1371