Add option to use Allen ontology for collapsing across samples

The issue

In addition to the MNI coordinates of each sample, the Allen Institute provides a detailed ontology of the brain structure each sample was ostensibly taken from in each donor brain. Used in a bunch of published works (e.g., Hawrylycz et al., 2015), this ontology provides a built-in "atlas" for reducing microarray samples down into the sort of region x gene dataframes returned by abagen.get_expression_data().

Taking from the aforementioned paper:

For each brain, 345–911 samples spanning one (n = 4) or both (n = 2) hemispheres were analyzed using whole-genome Agilent microarrays. In total, samples from 232 discrete brain structures were sampled at least once in at least one brain. We first focused on comparing expression patterns for a smaller set of 96 brain regions that were sampled at least twice in at least five brains, pooling across hemispheres.

Proposed solution

Allow an alternative procedure to the currently implemented abagen.get_expression_data() function that requires a Nifti-like atlas that lets users combine samples using the built-in Allen Institute ontology. By default I think the 232 structures should be used and then users can specify (via the return_counts parameter) whether they want to know how many samples were pooled for each structure and reduce down to the 96 as desired.

Perhaps this can be the default for the abagen.get_expression_data()? That is, make atlas an optional parameter and, if none is supplied, use the Allen ontology?

rmarkello / abagen

Add option to use Allen ontology for collapsing across samples #69

The issue

Proposed solution