poldracklab / fitlins

Fit Linear Models to BIDS Datasets
https://fitlins.readthedocs.io
Apache License 2.0
73 stars 30 forks source link

handling multiple runs via weighted effects modeling #151

Open satra opened 5 years ago

satra commented 5 years ago

currently fitlins doesn't handle multiple runs from a single participant for a given task/set of contrasts in the traditional neuroimaging way (i.e. using fixed effects).

nistats supports this by passing multiple runs and design matrices to the first level model fit and then using a fixed effects model in estimating each contrast across runs.

example: https://nistats.github.io/auto_examples/02_first_level_models/plot_spm_multimodal_faces.html#sphx-glr-auto-examples-02-first-level-models-plot-spm-multimodal-faces-py

code piece: https://github.com/nistats/nistats/blob/master/nistats/first_level_model.py#L568

there are cases where projects would want to look at run level contrasts and cases where this would be combined. this seems like this needs to be handled at the level of bids model as well as fitlins.

one way to extend the interface is to output/store labels and results in addition to the contrasts for each run. then we could call the hidden function (not a clean approach).

cc: @tyarkoni @effigies

satra commented 5 years ago

@tyarkoni - any thoughts on this re: the model spec?

i think we can automatically assume that run level does both individual runs and groups runs together (via fixedfx). but if we wanted something to only pull out a contrast from a specific run for group stats, we would need to add run index info somewhere.

i'd like to finish the cifti processing and this together. should i just make the assumption and hack fitlins to support it?

effigies commented 5 years ago

If you wanted to do a fixed effects within subject, wouldn't you just make subject the first level?

Not that FitLins can handle concatenating runs as yet, but as far as PyBIDS is concerned, I think that should produce the correct design matrix.

satra commented 5 years ago

@effigies - yes, i just don't remember the distinction between run and subject in the model specification. do we have run->subject-> session-> dataset ?

effigies commented 5 years ago

So these levels are the units that you're parallelizing across. By having a run level, you're saying estimate each run separately. If you're setting your first level as the subject, then you are aggregating all runs within a subject for the first level.

Right now, you have to go from lower-level to higher, so I don't know that you can do subject -> session. Though it seems obvious you should be able to.

satra commented 5 years ago

i have to check where in fitlins you have the separation between run and subject level. if you have direct pointers, that would help.

as a note, second level in nistats is not subject level, it's group/dataset level.

tyarkoni commented 5 years ago

The intent is definitely that BIDS-StatsModels should be able to handle this, and as @effigies says, in principle this should work if you just make subject the first level in your BIDS-SM model spec. That said, I don't think I've ever tested this, so it's quite likely that PyBIDS won't give you back exactly what you need. I suspect that it will look right, in that you'll get back one design matrix per subject, but that the runs won't be concatenated and/or aligned properly. There are definitely known issues with concatenation (it's non-trivial to do correctly in a generic way; see bids-standard/pybids#367 for discussion); I'm just not sure off the top of my head whether they would affect this particular use case.

That said, in practice there should rarely be much of a difference between fitting run-level models and then averaging the estimates at the subject level, and fitting one model over concatenated runs at the subject level. So while I recognize that ultimately users should be able to do it either way, a reasonable workaround right now is to fit run-level models and have them automatically averaged at the session and/or subject levels, which I believe fitlins currently handles.

effigies commented 5 years ago

i have to check where in fitlins you have the separation between run and subject level. if you have direct pointers, that would help.

The levels are defined in the model. See this loop for the setup:

https://github.com/poldracklab/fitlins/blob/fc2242a3560527a3646d684cf5383225a04d77ca/fitlins/workflows/base.py#L196

as a note, second level in nistats is not subject level, it's group/dataset level.

In that it's not fixed effects? Or are there other assumptions built into SecondLevelModel?

tyarkoni commented 5 years ago

I don't think there are any other subject-specific assumptions built into SecondLevelModel; I suspect it's just described as multiple subjects because in the conventional setting, most people only model subject as a random factor. The principled distinction as I see it is that FirstLevelModel does fixed effects and SecondLevelModel does random effects. Maybe @bthirion can clarify.

satra commented 5 years ago

@tyarkoni - correct. the extension that needs to happen is that the run level models are aggregated into a single estimate.

here is @bthirion 's code for a separate project.

https://github.com/hbp-brain-charting/public_analysis_code/blob/master/ibc_public/utils_pipeline.py#L601

in nistats this is done automatically, if the first level model is given multiple runs as input (calls the fixed effects code internally).

for us to do this in a second node, we would follow the code i just linked to taking as input the run level estimates and their variances.

satra commented 5 years ago

@effigies - i will add a fixed effects stage into that code

tyarkoni commented 5 years ago

As a practical matter, we could hack fitlins to loop over subjects, get each subject's list of runs, and then pass those together to nistats. But I really don't like that approach, because it basically breaks the spec—i.e., it internally ignores the BIDS-SM instructions on the presumption that it knows what it needs to do in order to produce the desired model. That may be true in this particular case, but it almost certainly won't be true in other cases.

I think the better solution is to ensure this happens centrally in pybids and/or fitlins rather than nistats. We want to make sure that one can fit a concatenated model in any estimator, not just in nistats. If we have to introduce new fields to the BIDS-SM spec, I think that's preferable than making this an undocumented, interface-level implementation thing.

satra commented 5 years ago

@tyarkoni - i think in the model we just introduce a subject analysis level for now (see below) - for the moment i will simply do a subject analysis level. this will trigger the fixedfx analysis. in terms of software, we do this routinely with all software in nipype: fsl, spm, and nipy all support such a three stage model. in spm, it's done in one go with one design matrix. so for implementations, it would simply be a matter of mapping stages to software steps.

for example, a future version of fitlins + pybids (with concatenation) could implement this in one go. whether one would do that or not depends on optimization issues.

for now (and for the future) it makes sense to model each run separately and then combine the results. for example the bold5000 dataset has a particularly large set of runs. instead of creating one giant matrix, you loop/parallelize over runs and then combine the results.

{
  "Name": "ds003_model001",
  "Description": "",
  "Input": {
    "task": "rhymejudgment"
  },
  "Steps": [
    {
      "Level": "run",
      "Transformations": [
        {
          "Name": "Factor",
          "Input": [
            "trial_type"
          ]
        },
        {
          "Name": "Convolve",
          "Input": ["trial_type.word", "trial_type.pseudoword"],
          "Model": "spm"
        }
      ],
      "Model": {
        "X": [
          "trial_type.word",
          "trial_type.pseudoword",
          "framewise_displacement",
          "trans_x", "trans_y", "trans_z", "rot_x", "rot_y", "rot_z",
          "a_comp_cor_00", "a_comp_cor_01", "a_comp_cor_02",
          "a_comp_cor_03", "a_comp_cor_04", "a_comp_cor_05"
        ]
      },
      "Contrasts": [
        {
          "Name": "word_gt_pseudo",
          "ConditionList": [
            "trial_type.word",
            "trial_type.pseudoword"
          ],
          "Weights": [1, -1],
          "Type": "t"
        }
      ]
    },
    {
      "Level": "subject",
      "AutoContrasts": ["word_gt_pseudo"]
    },
    {
      "Level": "dataset",
      "AutoContrasts": ["word_gt_pseudo"]
    }
  ]
}
tyarkoni commented 5 years ago

So, just to make sure we're on the same page: you can fit the above model right now, correct? It looks to me like it should work fine as written. And if I understand your concern correctly, it's that in principle, we should also be able to handle pre-estimation concatenation of runs, which the BIDS-SM spec currently implicitly allows, but PyBIDS/fitlins doesn't handle.

If that's all correct, then I think we're on the same page... it's just a matter of finding the time, and TBH I don't see this is a particularly high priority so long as aggregation of run-level estimates works. But it should definitely stay on the radar.

satra commented 5 years ago

@tyarkoni - not quite on the same page.

say you have S1->(R1, R2), S2-> (R1, R2), S3->(R1, R2), the fixedfx code pointed at above. will give you S1->C1, S2->C2, S3->C3, and the second level model will combine these C1/2/3 at the dataset level and give you a single dataset contrast.

the fixed fx code: https://github.com/hbp-brain-charting/public_analysis_code/blob/master/ibc_public/utils_pipeline.py#L609 generates a new contrast C1, that incorporates the run level variance.

right now (as implemented), the run level contrasts across participants CS1R1, CS1R2, CS2R1, ... CS3R2 are all being sent in to the second level model and fit using a design matrix. this process does not in any way incorporate the run level variance. thus current fitlins is not doing the traditional imaging model that FSL, SPM, and nistats and others do.

bthirion commented 5 years ago

I don't think there are any other subject-specific assumptions built into SecondLevelModel; I suspect it's just described as multiple subjects because in the conventional setting, most people only model subject as a random factor. The principled distinction as I see it is that FirstLevelModel does fixed effects and SecondLevelModel does random effects. Maybe @bthirion can clarify.

You're basically right:

bthirion commented 5 years ago

Note that concatenating all runs and using a huge design matrix à la SPM is very inefficient computationally, and assumes that there is only one error term across runs, which does not sound right.

satra commented 5 years ago
  • FirstLevelModel, when given lists of data, performs a fixed effect across the list items.

this is the bit that current fitlins does not do anywhere. it does not have to be done at the run level estimation, and my PR would essentially add this as the subject analysis level. i.e. given lists of contrasts and their variances from runs, construct a single contrast and variance and stat.

other things that would need to be taken care of are: if you specify an F contrast at the run level, that cannot be carried forward to higher levels without allowing for other types of stats (g-tests, non-parametric tests, etc.). so contrast type would need to be propagated.

tyarkoni commented 5 years ago

this is the bit that current fitlins does not do anywhere. it does not have to be done at the run level estimation, and my PR would essentially add this as the subject analysis level. i.e. given lists of contrasts and their variances from runs, construct a single contrast and variance and stat.

I agree that fitlins doesn't currently do this, but I wouldn't describe what you're talking about as a fixed-effects analysis; it's just one particular weighting scheme. With respect to the eventual group-level estimates, the runs are still being treated as fixed effects either way—we don't model them as samples from some underlying stochastic process. IMO this would be best handled as a transformation defined in the spec and implemented in PyBIDS, and not as a modeling procedure in fitlins. Basically all we're talking about here is taking two sets of inputs from the previous levels (means and variances) and weighting the former by the latter.

This would require introducing transformations that apply to image inputs, but I think we've known for a while that this was inevitable—I was mostly putting it off because it's a pain in the ass. But from both a spec and tooling standpoint, it makes much more sense to me to decouple this from the modeling process and treat it purely as a data transformation.

satra commented 5 years ago

Basically all we're talking about here is taking two sets of inputs from the previous levels (means and variances) and weighting the former by the latter.

yes (it's a center of mass/gravity operation), and creating a new effect, effect_variance, statistic, and dof (which is currently being avoided - also dof can be an image)

decouple this from the modeling process and treat it purely as a data transformation.

i'm fine with that, but for the present the easiest for me to implement is what i was proposing. since the standard is not set, we could revise this during the hackathon.

philosophically (with my nipype hat on), every step is a data transformation :) if you are going to implement every imaging transformation in pybids, that would make it a monster. so where do you want to draw the line (i.e. what would be your current governing principles)?

tyarkoni commented 5 years ago

philosophically (with my nipype hat on), every step is a data transformation :) if you are going to implement every imaging transformation in pybids, that would make it a monster. so where do you want to draw the line (i.e. what would be your current governing principles)?

Ideally I'd like to rely on an existing standard for this kind of thing, but the only thing we've found so far that seems to fit the bill is PFA, which is awesome and has a Python implementation, but is no longer maintained/developed. Even so, I'm still leaning towards just wrapping the Transformations sections of a BIDS-SM document around PFA...

satra commented 4 years ago

@tyarkoni, @effigies - changed the title - as people are coming to this and thinking fitlins doesn't handle multiple runs for a given task.