HDF5 issue with matlab: Large .mat files need to be saved with -v 7.3

gladomat commented 7 years ago

Summary

I'm running a Bayesian analysis using Per Síden's code using spatial priors and the resulting SPM.mat files are massive for ultra high field data (upwards of 2 GB). I had to change the SPM code to make the saves possible, for example in spm_contrasts.m line 313: save('SPM.mat', 'SPM', '-v7.3').

In nipype, however, I get an error:

File "/home/raid2/mihai/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 65, in mat_reader_factory
    raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files

Actual behavior

Nipype crashes as it can't open or save -v 7.3 .mat files.

Expected behavior

Nipype should save and open -v 7.3 files natively.

How to replicate the behavior

I changed said line in spm_contrasts.m to save -v 7.3 .mat files and ran a second level analysis. At the calculate contrasts step I encounter the above error.

Script/Workflow details

Sorry, it's kinda long. I'll shorten the subject list to 1 subject.

from os.path import join as opj
import os
from nipype.interfaces.io import SelectFiles, DataSink, DataGrabber
from nipype.interfaces.spm import (OneSampleTTestDesign, EstimateModel,
                                   EstimateContrast, Threshold)
from nipype.interfaces.utility import IdentityInterface
from nipype.pipeline.engine import Workflow, Node
import time

# Start timing
start = time.time()

experiment_dir = '/data/pt_nmc002/TDMGB_FMRI_bbreg_native/' # location of experiment folder
output_dir = '2nd_level/spespk_frequentist_33-subs'     # name of 2nd-level output folder
input_dir_cons = 'normed_contrasts'# name of norm output folder

subject_list = [
       "sub-01",
        ]

    # list of contrast identifiers
contrasts = ['con_0001']
con_names = ['all sounds']

fwhmlist = [1, 2]

# One Sample T-Test Design - creates one sample T-Test Design
onesamplettestdes = Node(OneSampleTTestDesign(),
                             overwrite=True,
                         name="onesampttestdes")
# EstimateModel - estimate the parameters of the model
# Even for second level it should be 'Classical': 1.
level2estimate = Node(EstimateModel(estimation_method={'Classical': 1}),
                      name="level2estimate")
# EstimateContrast - estimates simple group contrast
level2conestimate = Node(EstimateContrast(group_contrast=True),
                         name="level2conestimate")
cont1 = ['Group', 'T', ['mean'], [1]]
level2conestimate.inputs.contrasts = [cont1]

## Create the 2nd level pipeline
l2analysis = Workflow(name='l2analysis', base_dir=experiment_dir+'/tmp')
l2analysis.config['execution']['crashdump_dir'] = base_dir=experiment_dir + '/tmp/l2analysis/crash_files'

# Connect up the 2nd-level analysis components
l2analysis.connect([(onesamplettestdes, level2estimate, [('spm_mat_file',
                                                          'spm_mat_file')] ),
                    (level2estimate, level2conestimate, [('spm_mat_file',
                                                          'spm_mat_file'),
                                                         ('beta_images',
                                                          'beta_images'),
                                                         ('residual_image',
                                                          'residual_image')]),
                    ])

# collect all the con images for each contrast.
contrast_ids = list(range(1, len(contrasts) + 1))
l2source = Node(DataGrabber(infields=['fwhm', 'con']), name="l2source")
# we use .*i* to capture both .img (SPM8) and .nii (SPM12)
l2source.inputs.template = os.path.abspath(experiment_dir + '/*/' + \
                            input_dir_cons + '/*/_fwhm_%d/con_%04d*.*i*')
# iterate over all contrast images
l2source.iterables = [('fwhm', fwhmlist),
                      ('con', contrast_ids)]
l2source.inputs.sort_filelist = True

# Datasink - creates output folder for important outputs
datasink = Node(DataSink(base_directory=experiment_dir,
                         container=output_dir),
                name="datasink")

# Use the following DataSink output substitutions
substitutions = [('_con_1_fwhm_1', 'all_sounds_fwhm_1'),
                 ('_con_1_fwhm_2', 'all_sounds_fwhm_2')]

    # Connect SelectFiles and DataSink to the workflow
l2analysis.connect([(l2source, onesamplettestdes, [('outfiles', 'in_files')]),
                    (level2conestimate, datasink, [('spm_mat_file',
                                                    '@spm_mat'),
                                                   ('spmT_images',
                                                    '@T'),
                                                   ('con_images',
                                                    '@con')]),
                    ])

l2analysis.write_graph(dotfilename='l2analysis_frequentist.dot', graph2use='orig', format='pdf')
l2analysis.run('MultiProc', plugin_args={'n_procs': 21})

# Time again and spit out difference.
stop = time.time()
if (stop-start) < 3600:
    print('Elapsed time: %.2f mins.' %((stop-start)/60))
else:
    print('Elapsed time: %.2f hours.' %((stop-start)/60/60))

Platform details:

Linux 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

nipype version: 0.12.1

Execution environment

Native linux with conda installer python 2.7

mgxd commented 7 years ago

@gladomat these -v 7.3 files are not implemented in scipy's reader yet - however we should be able to read them in with h5py. The spm offenders within nipype seem to be EstimateModel and EstimateContrast.

we would still want to attempt to open the mat file with scipy.io.loadmat first, but if that raises an NotImplementedError we can opening with h5py.File. Do you want to give this PR a crack?

gladomat commented 7 years ago

Do you want to give this PR a crack?

You mean do I want to add the implementation myself? I can try. It would be an error catcher I suppose. I'll look into it and report how far I've gotten.

mgxd commented 7 years ago

yes, essentially two try/except blocks (one for importing h5py, one for using h5py.File()) - if you have any problems I can help out/implement it down the line

gladomat commented 7 years ago

I'll do it but it may take a little while. Since I'm the only one using it, I guess it can wait.

psadil commented 5 years ago

I also ran in to this issue after needing to use v7.3 .mat files (first level analysis, but there was a large number of very long runs). Adding try/except blocks with h5py seems simple enough, and if this is still open I wouldn't mind trying to contribute.

Is there anything else I should be looking to modify besides the EstimateModel and EstimateContrast from spm/model.py?

psadil commented 5 years ago

@gladomat @mgxd checking in about the status of this. If it'd be useful, I'm still happy to submit a possible modification to spm/model.py that would allow reading of v7.3 .mat files.

satra commented 5 years ago

@psadil - these are the occurrences i found.

$ grep -rn loadmat nipype
nipype/algorithms/misc.py:358:        in_dict = sio.loadmat(op.abspath(self.inputs.in_file))
nipype/algorithms/misc.py:397:        in_dict = sio.loadmat(op.abspath(self.inputs.in_file))
nipype/algorithms/rapidart.py:756:        spmmat = sio.loadmat(self.inputs.spm_mat_file, struct_as_record=False)
nipype/interfaces/spm/model.py:287:        spm = sio.loadmat(self.inputs.spm_mat_file, struct_as_record=False)
nipype/interfaces/spm/model.py:489:        spm = sio.loadmat(self.inputs.spm_mat_file, struct_as_record=False)

it may be worthwhile to create a loadmat function in utils and reuse it in these places:

def loadmat(filename, **kwargs):
    try: 
        return sio.loadmat(filename, **kwargs)
    except NotImplementedError:
        import h5py as h5
        mat = None
        with h5.File(filename, 'r') as f:
            ...
        return mat

for specific mappings of MATLAB to HDF5 see this: https://github.com/frejanordsiek/hdf5storage

i think for the purpose of these interfaces h5py should be sufficient without going to hdf5storage.

we also use sio.savemat at different places, but i don't think most of those would end up needing to switch at this point.

skjerns commented 4 years ago

I've implemented a library that can read the mat 7.3 files, have a look here:

https://github.com/skjerns/mat7.3

or install via pip pip install mat73 and load with

import mat73
data_dict = mat73.loadmat('data.mat')

adamnarai commented 7 months ago

I've encountered the same problem, my model is too large, Matlab saves SPM.mat in v7.3 and then nipype cannot handle it. I tried using mat73 but it throws "not a MATLAB datatype" errors for me when trying to read a v7.3 SPM.mat file. If I'm not mistaken only the Vbeta, Vcon and Vspm variables are used from SPM.mat in model.py, so there is no need for a general v7.3 loadmat function, it is enough to read these variables. So I made a fix by creating a load_spm_mat() function similarly to @satra's suggestion, it reads Vbeta, Vcon and Vspm in the except branch and formats them the same as scipy.io.loadmat, therefore it requires no changes in model.py, except for replacing the loadmat() function. Maybe this is not the most elegant solution, but I can make a PR if you think this is a useful addition.

nipy / nipype