Closed rmarkello closed 5 years ago
Alright, so here's what we're going to do to force hippocampus to be counted as part of subcortex:
abagen.samples.ONTOLOGY
object to include the hippocampal formation structure code and label it as subcortex. abagen.samples._get_struct()
such that if a path contains multiple IDs present in the ONTOLOGY
object it selects the structure corresponding to the ID that occurs latest in the structure path.>>> ONTOLOGY = Recoder(
(('4008', 'cerebral cortex', 'cortex'),
('4275', 'cerebral nuclei', 'subcortex'),
('4391', 'diencephalon', 'subcortex'),
('9001', 'mesencephalon', 'subcortex'),
('4696', 'cerebellum', 'cerebellum'),
('9131', 'pons', 'brainstem'),
('9512', 'myelencephalon', 'brainstem'),
('9218', 'white matter', 'white matter'),
('9352', 'sulci & spaces', 'other'),
('4219', 'hippocampal formation', 'subcortex')),
fields=('id', 'name', 'structure')
)
>>> path = '/4005/4006/4007/4008/4219/4249/12896/4251/'
>>> abagen.samples._get_struct(path)
'subcortex'
Note that the path
object contains both ids '4008'
(corresponding to cerebral cortex) and '4219'
(corresponding to the hippocampal formation) which are both present in ONTOLOGY
; however, since '4219' occurs later in the path, we select that ID and grba the relevant structure (i.e., 'subcortex'
).
We should be able to accomplish this by modifying the _get_struct()
function to sort the matching ids by a key
, where the key=lambda x: path.index(x)
, and then use the last id in the sorted list.
The issue
tl;dr The Allen Institute ontology classifies hippocampus as part of cortex, not subcortex, which could cause problems for matching some microarray samples to ROIs.
When users provide a file or dataframe to the
atlas_info
parameter inabagen.get_expression_data()
they are required to specify a broad structural class for each region in theiratlas
(in a column labelled 'structure' in the file/dataframe). The current options for this structural class include:We match these designations with the information from the Allen ontology such that samples that don't fall directly within a region in the
atlas
aren't incorrectly assigned to regions inatlas
across hemispheric / structural boundaries.That is, if one of the samples from the Allen Institute is labelled as having come from the left hemisphere subcortex we make sure to only assign it to a region in the user-specified
atlas
labelled as belonging to the left hemisphere subcortex. This impacts only a minority of samples (i.e., we don't currently check whether this is the case for those samples having coordinates directly within a region in the atlas), but a significant minority, nonetheless.While matching these designations seems like a reasonable approach in most cases, the one point of contention that a general user might have is that the Allen Institute ontology classifies the hippocampal formation (including the subiculum, dentate gyrus, and CA1-4) as part of "cortex" rather than "subcortex". Specifically, their ontology specifies:
Thus, if a researcher provides an atlas where they label all their hippocampal ROIs as "subcortex" they're liable to get vastly different results than if they label all their hippocampal ROIs as "cortex."
While I have it on good authority that the hippocampus is often considered part of "allocortex," I'm hesitant to add this as a permissible structural class to
abagen
since it seems quite a bit more specific than the current (rather broad) structural designations listed above (1-6).Proposed solution
I genuinely don't know! It would be great to allow either specification for the hippocampus (i.e., "cortex" or "subcortex"), but the current framework for getting these structural classes from the Allen ontology doesn't allow for this hedging. I can think about how to modify it for this one instance in particular, but in the interim it would be great to come up with alternatives.
One option that might be worthwhile is to simply allow users to specify either (or both) of the expected 'hemisphere' and 'structure' information in
atlas_info
and just use whatever is available. Then, users who have hippocampal ROIs can refrain from specifying the 'structure' for their ROIs and we'll do our best to ensure samples simply don't cross hemispheric boundaries. This isn't necessarily ideal because there's the possibility that samples will get incorrectly assigned across e.g., cortical/subcortical boundaries for regions that aren't the hippocampus (but we might still consider this option outside of the current problem!).Alternatively (and perhaps most immediately appealing), we can add a warning on the documentation about this designation and inform users to specify that their hippocampal ROIs are part of "cortex" (not "subcortex") when they provide
atlas_info
.