nipy / mindboggle

Automated anatomical brain label/shape analysis software (+ website)
http://mindboggle.info
Other
143 stars 54 forks source link

brief blueprint of concatenating data from mindboggle output #142

Closed satra closed 6 years ago

satra commented 6 years ago

@binarybottle - this is what i kind of meant - before i create a PR, i thought we could discuss if this makes sense.

from glob import glob
import os

import pandas as pd

def short_fname(fn):
    return ''.join([v[0] for v in fn.split('/tables/')[-1].replace('/','_').split('_')])

def fname2df(fname):
    """Read a csv into a single dataframe row
    """
    df = pd.read_csv(fname, na_values=[0.0]).dropna(axis=0)
    sn = short_fname(fname)
    outerproduct = [[sn+'-'+x+'-'+y.lstrip() for x in df.name] for y in df.keys()[2:]]
    outerproduct = np.array(outerproduct).flatten().tolist()
    df_row = pd.DataFrame(data=df.iloc[:, 2:].values.flatten()[None, :], columns=outerproduct, index=[0])
    return df_row

def process_participant(subject_ids, base_dir):
    """Generate a pandas dataframe across all subjects
    """
    out = None
    for id in subject_ids:
        fl = glob(os.path.join(base_dir, id, 'tables', '*.csv')) + glob(os.path.join(base_dir, id, 'tables', '*', '*.csv'))
        dft = pd.concat([fname2df(val) for val in sorted(fl) if 'vertices' not in val], axis=1)
        dft.index = [id]
        out = dft if out is None else pd.concat((out, dft), axis=0)
    return out

using the code above one can do something like:

In [189]: dft = process_participant(['sub-1', 'sub-2'], '/path/to/mindboggled/')

In [190]: dft.index
Out[190]: Index(['sub-1', 'sub-2'], dtype='object')

In [191]: dft.shape
Out[191]: (2, 11190)

and the keys look like:

In [201]: dft.keys()[:10]
Out[201]: 
Index(['lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 10',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 2',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 3',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 4',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 5',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 6',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 7',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 8',
       'lcsls-ctx-lh-caudalanteriorcingulate-Laplace-Beltrami spectrum: component 9',
       'lcsls-ctx-lh-caudalanteriorcingulate-Zernike moments: component 1'],
      dtype='object')
binarybottle commented 6 years ago

I think this looks good and makes good sense, @satra! I would make "process_participant" plural and revise the docstring. Good idea not to include vertices.csv! Are there really 11,190 shape measures? That's a bunch!

binarybottle commented 6 years ago

Any reason why no pull request?

satra commented 6 years ago

@binarybottle - over the weekend.