poldracklab / fitlins

Fit Linear Models to BIDS Datasets
https://fitlins.readthedocs.io
Apache License 2.0
76 stars 32 forks source link

ValueError: group_by contains variable(s) ['subject'] that could not be found in the entity index. #338

Closed adelavega closed 2 years ago

adelavega commented 2 years ago

This could be a pybids issue, but posting here since its relevant.

I implemented the v0.x -> 1.0 StatsModel conversion script: https://github.com/neuroscout/neuroscout-cli/pull/147

This is a resulting model (loaded into Python)

{'Name': 'Test - v1 - 2',
 'Input': {'Run': [1, 2],
  'Task': 'movie',
  'Subject': ['sid000005', 'sid000007']},
 'Description': '',
 'BIDSModelVersion': 1.0,
 'Nodes': [{'Level': 'Run',
   'Model': {'X': ['as-Speech', 1]},
   'Contrasts': [{'Name': 'as-Speech',
     'Type': 't',
     'Weights': [1],
     'ConditionList': ['as-Speech'],
     'Test': 't'}],
   'Name': 'Run',
   'Transformations': {'Transformer': 'pybids-transforms-v1',
    'Instructions': [{'Name': 'Convolve', 'Input': ['as-Speech']}]}},
  {'Level': 'Subject',
   'DummyContrasts': {'Test': 't'},
   'Name': 'Subject',
   'Model': {'X': [1], 'Type': 'Meta'}},
  {'Level': 'Dataset',
   'DummyContrasts': {'Test': 't'},
   'Name': 'Dataset',
   'Model': {'X': [1]}}]}

Running fitlins, I'm getting this error:

File /opt/miniconda-latest/envs/neuro/lib/python3.9/site-packages/bids/modeling/statsmodels.py:445, in BIDSStatsModelsNode.run(self, inputs, group_by, force_dense, sampling_rate, invalid_contrasts, **filters)                                                                  
    443 # group all collections and inputs                                                                                                                                                                                                                                        
    444 all_objects = inputs + collections                                                                                                                                                                                                                                        
--> 445 groups = self._build_groups(all_objects, group_by)                                                                                                                                                                                                                        
    447 results = []                                                                                                                                                                                                                                                              
    449 for grp_ents, grp_objs in list(groups.items()):                                                                                                                                                                                                                           
    450                                                                                                                                                                                                                                                                           
    451     # split group's objects into inputs and collections                                                                                                                                                                                                                   

File /opt/miniconda-latest/envs/neuro/lib/python3.9/site-packages/bids/modeling/statsmodels.py:348, in BIDSStatsModelsNode._build_groups(objects, group_by)                                                                                                                       
    346 missing_vars = list(set(group_by) - set(df.columns))
    347 if missing_vars:
--> 348     raise ValueError("group_by contains variable(s) {} that could not "
    349                      "be found in the entity index.".format(missing_vars) )
    351 # Restrict DF to only grouping columns
    352 df = df.loc[:, group_by]

ValueError: group_by contains variable(s) ['subject'] that could not be found in the entity index.

Diving in interactively, this is during the Subject level. It seems that by default, it groups by ['subject', 'contrast'].

However, the df in this instance looks like:

   run   contrast
0    1  as-Speech
1    2  as-Speech

Looks correct aside that it should have a subject column in addition to be able to do the groupby.

Any ideas off the top of your head?

adelavega commented 2 years ago

Is it possible that the inputs lack the correct entities:

inputs (outputs from l1) are:

[ContrastInfo(name='as-Speech', conditions=['as-Speech'], weights=[1], test='t', entities={'run': 1, 'contrast': 'as-Speech'}),
 ContrastInfo(name='as-Speech', conditions=['as-Speech'], weights=[1], test='t', entities={'run': 2, 'contrast': 'as-Speech'})]

It seems like this should be of length 4, as there are 2 subjects and 2 runs.

adelavega commented 2 years ago

Issue was missing GroupBy in each level.