poldracklab / fitlins

Fit Linear Models to BIDS Datasets
https://fitlins.readthedocs.io
Apache License 2.0
74 stars 30 forks source link

Level 3 model fails due to not finding second levels results #101

Open adswa opened 5 years ago

adswa commented 5 years ago

(migrating this from pybids where I falsely posted it before) pinging @effigies (@yarikoptic promised me to give you a hydra account that I can finally share the data).

We've stripped our dataset to a bare minimum: 3 subjects, each with 2 runs, and a model file with only one event in X to reduce computation time. You can find the dataset on hydra in the directory /data/movieloc/backup_store/publish-BIDSSacc on branch slim (please disregard the master branch for now - it is a couple of weeks behind and I don't want to push the currently rather messy state of it). The model (found under models/movel_v3_smdl_autocon_splitconv_oneevent.json is evoked with

fitlins . Jan_28 'run' -m models/movel_v3_smdl_autocon_splitconv_oneevent.json --desc highpass --space 'MNI152NLin6Sym' -d $PWD -w 'Jan_28_wd' --n-cpus 3

Click here to expand our model ```json { "name": "FEF_localizer", "Input": { "session": "movie" }, "Steps": [ { "Level": "run", "Model": { "X": [ "amplitude_.RIGHT" ] }, "Contrasts": [], "AutoContrasts": true, "Transformations": [{ "Name": "Split", "Input": ["amplitude_"], "By": ["trial_type"] }, { "Name": "Convolve", "Input": ["amplitude_.RIGHT"], "Model": "spm" }] }, { "Level": "subject", "AutoContrasts": true }, { "Level": "dataset", "AutoContrasts": true }] } ```


The model fails on level 3 (dataset) with

ValueError: A second level model requires a list with atleast two first level models or niimgs

after producing seemingly sensible output on the first (run) and second (subject) level. As far as I can judge, the files that the level 3 model should ingest have been created.

Here is the traceback ```python Traceback (most recent call last): File "/home/adina/Repos/nipype/nipype/pipeline/plugins/multiproc.py", line 69, in run_node result['result'] = node.run(updatehash=updatehash) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 473, in run result = self._run_interface(execute=True) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 1254, in _run_interface self.config['execution']['stop_on_first_crash']))) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 1176, in _collate_results (self.name, '\n'.join(msg))) Exception: Subnodes of node: l3_model failed: Subnode 0 failed Error: Traceback (most recent call last): File "/home/adina/Repos/nipype/nipype/pipeline/engine/utils.py", line 99, in nodelist_runner result = node.run(updatehash=updatehash) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 473, in run result = self._run_interface(execute=True) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 557, in _run_interface return self._run_command(execute) File "/home/adina/Repos/nipype/nipype/pipeline/engine/nodes.py", line 637, in _run_command result = self._interface.run(cwd=outdir) File "/home/adina/Repos/nipype/nipype/interfaces/base/core.py", line 369, in run runtime = self._run_interface(runtime) File "/home/adina/Repos/fitlins/fitlins/interfaces/nistats.py", line 174, in _run_interface model.fit(input, design_matrix=design_matrix) File "/home/adina/env/fitlins/local/lib/python3.5/site-packages/nistats/second_level_model.py", line 164, in fit raise ValueError('A second level model requires a list with at' ValueError: A second level model requires a list with atleast two first level models or niimgs ```


Reproducibility sits in a bar somewhere and laughs its ass of, but I'm trying to also give an overview of custom changes @yarikoptic and I made to the fitlins sourcecode. I don't see any particular relevance of the changes we made to the issue at hand (mostly hardcoding quickfixes for issues that erose), but then again, I'm certainly not the one to judge what is of relevance and what not ;-) , and an attempt to rerun the model would fail without the additional space choice and the selection of only one of two identical bold files being returned.

pybids changes ```patch diff --git a/fitlins/cli/run.py b/fitlins/cli/run.py index 28dcdba..f64e14e 100755 --- a/fitlins/cli/run.py +++ b/fitlins/cli/run.py @@ -85,15 +85,15 @@ def get_parser(): g_bids.add_argument('--derivative-label', action='store', type=str, help='execution label to append to derivative directory name') g_bids.add_argument('--space', action='store', - choices=['MNI152NLin2009cAsym', ''], + choices=['MNI152NLin2009cAsym', '', 'MNI152NLin6Sym'], default='MNI152NLin2009cAsym', help='registered space of input datasets. Empty value for no explicit space.') diff --git a/fitlins/interfaces/bids.py b/fitlins/interfaces/bids.py index 919742f..a7b98a9 100644 --- a/fitlins/interfaces/bids.py +++ b/fitlins/interfaces/bids.py @@ -177,7 +177,7 @@ class LoadBIDSModel(SimpleInterface): selectors = self.inputs.selectors analysis = Analysis(model=self.inputs.model, layout=layout) - analysis.setup(drop_na=False, desc='preproc', **selectors) + analysis.setup(drop_na=False, desc='highpass', space='MNI152NLin6Sym', **selectors) self._load_level1(runtime, analysis) self._load_higher_level(runtime, analysis) @@ -198,25 +198,37 @@ class LoadBIDSModel(SimpleInterface): [...] - if len(preproc_files) != 1: - raise ValueError('Too many BOLD files found') + # ATM we could get multiple entries for the same file + # see https://github.com/bids-standard/pybids/issues/350 + if len(set(f.path for f in preproc_files)) != 1: + raise ValueError( + 'Too many (%d) BOLD files found: %s' + % (len(preproc_files), ', '.join(preproc_files)) + ) fname = preproc_files[0].path ```


Do you have any idea what I am missing here to figure out why the dataset level does not work?

adswa commented 5 years ago

I have a preliminary update on this: The failure occurs in https://github.com/poldracklab/fitlins/blob/c7cc798330a8fe8cea8ee545c64c7fb08d1a562b/fitlins/interfaces/nistats.py#L165-L180 during the third level model. The problem is that self.input.contrast_info for a reason I do not yet understand ingests entries from the participants.tsv file and turns them into contrasts.

This is how `self.input.contrast_info` looks like for a 3rd level model ```python (Pdb) self.inputs.contrast_info [{'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_av_rating': 1}], 'type': 't', 'name': 'forrest_av_rating'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_seen_languages': 1}], 'type': 't', 'name': 'forrest_seen_languages'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'hearing_problems_current': 1}], 'type': 't', 'name': 'hearing_problems_current'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_seen_dist': 1}], 'type': 't', 'name': 'forrest_seen_dist'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_av_feeling': 1}], 'type': 't', 'name': 'forrest_av_feeling'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'age': 1}], 'type': 't', 'name': 'age'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_seen_count': 1}], 'type': 't', 'name': 'forrest_seen_count'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_av_storydepth': 1}], 'type': 't', 'name': 'forrest_av_storydepth'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_ad_known': 1}], 'type': 't', 'name': 'forrest_ad_known'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'vision_problems_current': 1}], 'type': 't', 'name': 'vision_problems_current'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_av_fatigue': 1}], 'type': 't', 'name': 'forrest_av_fatigue'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'amplitude_.RIGHT': 1}], 'type': 't', 'name': 'amplitude_.RIGHT'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'handedness': 1}], 'type': 't', 'name': 'handedness'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'hearing_problems_past': 1}], 'type': 't', 'name': 'hearing_problems_past'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_seen': 1}], 'type': 't', 'name': 'forrest_seen'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'vision_problems_past': 1}], 'type': 't', 'name': 'vision_problems_past'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'gender': 1}], 'type': 't', 'name': 'gender'}, {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'forrest_av_artist_count': 1}], 'type': 't', 'name': 'forrest_av_artist_count'}] ```


where everything apart from the entry {'entities': {'session': 'movie', 'task': 'avmovie'}, 'weights': [{'amplitude_.RIGHT': 1}], 'type': 't', 'name': 'amplitude_.RIGHT'} stems from the participants.tsv file.

The following https://github.com/poldracklab/fitlins/blob/c7cc798330a8fe8cea8ee545c64c7fb08d1a562b/fitlins/interfaces/nistats.py#L165 will then go and assign weights of zero to the first contrast_info entry (as it should), but then in the following https://github.com/poldracklab/fitlins/blob/c7cc798330a8fe8cea8ee545c64c7fb08d1a562b/fitlins/interfaces/nistats.py#L169-L172 attempts to give an empty list as input to model.fit(), which results in the failure I observed.

I'm not sure whether the problem lies in the fact that contrast_info is given contrast information that I did not consciously specify anywhere in my model.json (and it is weird that this only happens on third, not second level), or in the fact that it will attempt to give an empty list as input. As for a temporary fix, I modified it like this:

        for name, weights, type in prepare_contrasts(self.inputs.contrast_info, names):
            # Need to add F-test support for intercept (more than one column)
            # Currently only taking 0th column as intercept (t-test)
            weights = weights[0]
            if all(weights == np.zeros(len(names))):
                continue
            input = (np.array(filtered_files)[weights != 0]).tolist()
            design_matrix = pd.DataFrame({'intercept': weights[weights != 0]})
adswa commented 5 years ago

(I could also be entirely wrong with all of this, but it runs without failure with this added conditional statement)

effigies commented 5 years ago

Sorry, I'm having a lot of trouble getting this to work. I've forgotten... how did we resolve this:

Traceback (most recent call last):
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 557, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 637, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/interfaces/base/core.py", line 369, in run
    runtime = self._run_interface(runtime)
  File "/src/fitlins/fitlins/interfaces/bids.py", line 180, in _run_interface
    analysis.setup(drop_na=False, **selectors)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/analysis.py", line 89, in setup
    b.setup(input_nodes, drop_na=drop_na, **selectors)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/analysis.py", line 211, in setup
    coll = apply_transformations(coll, self.transformations)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/analysis.py", line 508, in apply_transformations
    func(collection, cols, **kwargs)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/transformations/base.py", line 87, in __new__
    return t.transform()
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/transformations/base.py", line 261, in transform
    result = self._transform(data[i], **kwargs)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/analysis/transformations/munge.py", line 318, in _transform
    return var.to_dense(sampling_rate=sampling_rate)
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/variables/variables.py", line 325, in to_dense
    duration = int(math.ceil(sampling_rate * self.get_duration()))
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/variables/variables.py", line 311, in get_duration
    return sum([r.duration for r in self.run_info])
  File "/opt/conda/envs/neuro/lib/python3.6/site-packages/bids/variables/variables.py", line 311, in <listcomp>
    return sum([r.duration for r in self.run_info])
AttributeError: 'str' object has no attribute 'duration'
adswa commented 5 years ago

@yarikoptic, do you remember? I've seen this before many times but can't recollect exactly and my computer is a bit disabled due to the model running in the background. Might that have been related to #353 (but thats merged already...)?

tyarkoni commented 5 years ago

Have not read everything carefully, but regarding the automatic inclusion of columns available in participants.tsv, scans.tsv, etc., that's by design—otherwise we would need to add a separate set of instructions to BIDS-StatsModels that governs where/how to get variables, and that's out of its scope. The idea is that if you explicitly want to exclude variables, you can use the Select transformation, passing in the names of only the variables you want to keep. So typically you would add that as the first transformation in the list at each new level, and thereafter you can be certain that you don't have unexpected variables popping up. (I believe there may also be a Delete or Remove that does the inverse, if it's preferred.)

(FWIW, this shouldn't only be happening at the 3rd level; it happens at all levels. E.g., if you have confounds.tsv or physio.tsv.gz files, you should also be seeing extra variables show up in your run-level models, unless you're explicitly selecting what you want in a transformation. If you're not seeing available run-level variables show up, please open a separate issue, as that would be a bug.)

adelavega commented 5 years ago

Doesn't X in Model also operate as a Select transformation?

effigies commented 5 years ago

It looks like the issue is with variables created by the Split transform. Digging deeper now.

yarikoptic commented 5 years ago

As Adina, isn't it @effigies this fix in pybids? https://github.com/bids-standard/pybids/pull/353, if not then may be something similar, what is the string value there?

tyarkoni commented 5 years ago

@adelavega yes, but that would only make sense for the first level; past that, there's no new model being fit, just contrasts applied to estimates propagated forward. So in the subject-level model, transformations would be needed to explicitly limit what autocontrasts gets applied to.

adswa commented 5 years ago

Thanks for this information! Yes, I noticed this automatic ingestions and iirc every column name of something ingestible (e.g. regressor.tsv files on the run level) was showing up at some point of the analysis. But only for the third level I ran into a problem. I've been wondering whether this 'only third-level' issue has something to do with the Step of the model? If I understand the contrast_info related code correctly, it is dependent on the analysis level, and the participant.tsv file is specified in the root of the dataset, and its column names are the only ones showing up (and not those from the run-specific regressors.tsv files anymore for example). So if I understand it correctly, this file would be only ingested during a dataset level analysis step (which would explain to me why it did not cause trouble on the second level)?

tyarkoni commented 5 years ago

It definitely shouldn't be happening only at the third level. As @adelavega pointed out, the X field in the first-level model will implicitly drop any unnamed variables, so that's why you may not see it happening there. But it should also be happening at subsequent levels, assuming you have a scans.tsv or sessions.tsv file containing extra columns. If you don't have anything in those files, then the behavior you see is exactly as intended.

The mapping, per the BIDS spec, is that session-level analysis automatically pulls in scans.tsv, subject-level analysis pulls in sessions.tsv, and dataset-level analysis pulls in participants.tsv.

effigies commented 5 years ago

As Adina, isn't it @effigies this fix in pybids? bids-standard/pybids#353, if not then may be something similar, what is the string value there?

'events'

adswa commented 5 years ago

@yarikoptic we ran into the string events.

tyarkoni commented 5 years ago

I'm guessing that this line may be passing variables in the wrong order, so that source (which can be 'events') is getting read instead of run_info.

effigies commented 5 years ago

Yeah. That's where I'm looking. It looks like #353 was a partial fix. Though I'm not sure why they can run these models and I can't...

adswa commented 5 years ago

It definitely shouldn't be happening only at the third level. As @adelavega pointed out, the X field in the first-level model will implicitly drop any unnamed variables, so that's why you may not see it happening there. But it should also be happening at subsequent levels, assuming you have a scans.tsv or sessions.tsv file containing extra columns. If you don't have anything in those files, then the behavior you see is exactly as intended.

The mapping, per the BIDS spec, is that session-level analysis automatically pulls in scans.tsv, subject-level analysis pulls in sessions.tsv, and dataset-level analysis pulls in participants.tsv.

huh. I actually don't have these files and had no clue I needed them (sorry, must have over-read this when studying the BIDS-spec, but I happily blame Michael for not having them in the first place in the sourcedata).

adswa commented 5 years ago

@effigies would it be helpful if I push my local branch where I committed, merged and cherry-picked all changes we found relevant to my fork here on Github?

tyarkoni commented 5 years ago

You don't need them--they're optional. It sounds like everything is working as intended (well, at least with respect to this part of things).

On Wed, Jan 30, 2019, 16:55 Adina Wagner <notifications@github.com wrote:

It definitely shouldn't be happening only at the third level. As @adelavega https://github.com/adelavega pointed out, the X field in the first-level model will implicitly drop any unnamed variables, so that's why you may not see it happening there. But it should also be happening at subsequent levels, assuming you have a scans.tsv or sessions.tsv file containing extra columns. If you don't have anything in those files, then the behavior you see is exactly as intended.

The mapping, per the BIDS spec, is that session-level analysis automatically pulls in scans.tsv, subject-level analysis pulls in sessions.tsv, and dataset-level analysis pulls in participants.tsv.

huh. I actually don't have these files and had no clue I needed them (sorry, must have over-read this when studying the BIDS-spec, but I happily blame Michael for not having them in the first place in the sourcedata).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/poldracklab/fitlins/issues/101#issuecomment-459143592, or mute the thread https://github.com/notifications/unsubscribe-auth/AASjPCh7Y2iv-f5dflAV-GFy6kHQ3T0aks5vIiL1gaJpZM4aYK93 .

effigies commented 5 years ago

@AdinaWagner Sure, you can go ahead and cherry-pick. I've fixed a couple things, so we're likely to have some clashes, but it would be good to see what you have.

adswa commented 5 years ago

okay, so my current software state is AdinaWagner/pybids branch runinfo-merge and AdinaWagner/fitlins branch BIDSSacc.

mgxd commented 5 years ago

I'm also running into the same issue with an image built off current master (ca403435dc74256cf41a91fed5e077b4e1315dac)

Crashfile
File: /om/project/voice/bids/scripts/fitlins/crash-20190315-170528-mathiasg-l3_model-2f07306c-9009-48f6-8009-8da1f86d937d.pklz Node: fitlins_wf.l3_model Working directory: /tmp/tmp4nwpaui8/fitlins_wf/l3_model Node inputs: contrast_info = [[{'entities': {'session': '1', 'subject': 'voice969', 'task': 'emosent'}, 'name': 'speech', 'type': 't', 'weights': [{'speech': 1}]}]] stat_files = stat_metadata = Traceback: Traceback (most recent call last): File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node result['result'] = node.run(updatehash=updatehash) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run result = self._run_interface(execute=True) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 1253, in _run_interface self.config['execution']['stop_on_first_crash']))) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 1175, in _collate_results (self.name, '\n'.join(msg))) Exception: Subnodes of node: l3_model failed: Subnode 0 failed Error: Traceback (most recent call last): File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/utils.py", line 99, in nodelist_runner result = node.run(updatehash=updatehash) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 473, in run result = self._run_interface(execute=True) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 557, in _run_interface return self._run_command(execute) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 637, in _run_command result = self._interface.run(cwd=outdir) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nipype/interfaces/base/core.py", line 375, in run runtime = self._run_interface(runtime) File "/src/fitlins/fitlins/interfaces/nistats.py", line 178, in _run_interface model.fit(input, design_matrix=design_matrix) File "/opt/miniconda-latest/envs/neuro/lib/python3.6/site-packages/nistats/second_level_model.py", line 170, in fit raise ValueError('A second level model requires a list with at' ValueError: A second level model requires a list with atleast two first level models or niimgs
effigies commented 5 years ago

@mgxd Can you provide your model? If this is somewhere I can login, it might be easiest to dig through your working directory.

effigies commented 5 years ago

@mathias It looks like you have degenerate inputs. You have two files, that you collapse in layer 2 to one file. Then at layer 3, you only have one file.

effigies commented 5 years ago

fitlins_wf/l3_model/mapflow/_l3_model0/_report/report.rst:

Node: nistats
=============

 Hierarchy : _l3_model0
 Exec ID : _l3_model0

Original Inputs
---------------

* contrast_info : [{'entities': {'session': '1', 'subject': 'voice969', 'task': 'emosent'}, 'name': 'speech', 'type': 't', 'weights': [{'speech': 1}]}]
* stat_files : [['/working/fitlins_wf/l2_model/mapflow/_l2_model0/speech.nii.gz']]
* stat_metadata : [[{'contrast': 'speech', 'session': '1', 'subject': 'voice969', 'suffix': 'stat', 'task': 'emosent'}]]