rmarkello / abagen

A toolbox for working with Allen Human Brain Atlas microarray expression data
https://abagen.readthedocs.io
BSD 3-Clause "New" or "Revised" License
93 stars 41 forks source link

The hippocampal formation: cortex or subcortex? #89

Closed rmarkello closed 5 years ago

rmarkello commented 5 years ago

The issue

tl;dr The Allen Institute ontology classifies hippocampus as part of cortex, not subcortex, which could cause problems for matching some microarray samples to ROIs.


When users provide a file or dataframe to the atlas_info parameter in abagen.get_expression_data() they are required to specify a broad structural class for each region in their atlas (in a column labelled 'structure' in the file/dataframe). The current options for this structural class include:

  1. cortex,
  2. subcortex,
  3. cerebellum,
  4. brainstem,
  5. white matter, and
  6. other (i.e., ventricles and such)

We match these designations with the information from the Allen ontology such that samples that don't fall directly within a region in the atlas aren't incorrectly assigned to regions in atlas across hemispheric / structural boundaries.

That is, if one of the samples from the Allen Institute is labelled as having come from the left hemisphere subcortex we make sure to only assign it to a region in the user-specified atlas labelled as belonging to the left hemisphere subcortex. This impacts only a minority of samples (i.e., we don't currently check whether this is the case for those samples having coordinates directly within a region in the atlas), but a significant minority, nonetheless.

While matching these designations seems like a reasonable approach in most cases, the one point of contention that a general user might have is that the Allen Institute ontology classifies the hippocampal formation (including the subiculum, dentate gyrus, and CA1-4) as part of "cortex" rather than "subcortex". Specifically, their ontology specifies:

brain 
└─ gray matter
   └─ telencephalon
      └─ cerebral cortex
         └─ limbic lobe
            └─ hippocampal formation

Thus, if a researcher provides an atlas where they label all their hippocampal ROIs as "subcortex" they're liable to get vastly different results than if they label all their hippocampal ROIs as "cortex."

While I have it on good authority that the hippocampus is often considered part of "allocortex," I'm hesitant to add this as a permissible structural class to abagen since it seems quite a bit more specific than the current (rather broad) structural designations listed above (1-6).

Proposed solution

I genuinely don't know! It would be great to allow either specification for the hippocampus (i.e., "cortex" or "subcortex"), but the current framework for getting these structural classes from the Allen ontology doesn't allow for this hedging. I can think about how to modify it for this one instance in particular, but in the interim it would be great to come up with alternatives.

One option that might be worthwhile is to simply allow users to specify either (or both) of the expected 'hemisphere' and 'structure' information in atlas_info and just use whatever is available. Then, users who have hippocampal ROIs can refrain from specifying the 'structure' for their ROIs and we'll do our best to ensure samples simply don't cross hemispheric boundaries. This isn't necessarily ideal because there's the possibility that samples will get incorrectly assigned across e.g., cortical/subcortical boundaries for regions that aren't the hippocampus (but we might still consider this option outside of the current problem!).

Alternatively (and perhaps most immediately appealing), we can add a warning on the documentation about this designation and inform users to specify that their hippocampal ROIs are part of "cortex" (not "subcortex") when they provide atlas_info.

rmarkello commented 5 years ago

Alright, so here's what we're going to do to force hippocampus to be counted as part of subcortex:

  1. Modify the abagen.samples.ONTOLOGY object to include the hippocampal formation structure code and label it as subcortex.
  2. Modify abagen.samples._get_struct() such that if a path contains multiple IDs present in the ONTOLOGY object it selects the structure corresponding to the ID that occurs latest in the structure path.

An example

>>> ONTOLOGY = Recoder(
    (('4008', 'cerebral cortex', 'cortex'),
     ('4275', 'cerebral nuclei', 'subcortex'),
     ('4391', 'diencephalon', 'subcortex'),
     ('9001', 'mesencephalon', 'subcortex'),
     ('4696', 'cerebellum', 'cerebellum'),
     ('9131', 'pons', 'brainstem'),
     ('9512', 'myelencephalon', 'brainstem'),
     ('9218', 'white matter', 'white matter'),
     ('9352', 'sulci & spaces', 'other'),
     ('4219', 'hippocampal formation', 'subcortex')),
    fields=('id', 'name', 'structure')
)

>>> path = '/4005/4006/4007/4008/4219/4249/12896/4251/'
>>> abagen.samples._get_struct(path)
'subcortex'

Note that the path object contains both ids '4008' (corresponding to cerebral cortex) and '4219' (corresponding to the hippocampal formation) which are both present in ONTOLOGY; however, since '4219' occurs later in the path, we select that ID and grba the relevant structure (i.e., 'subcortex').

We should be able to accomplish this by modifying the _get_struct() function to sort the matching ids by a key, where the key=lambda x: path.index(x), and then use the last id in the sorted list.