obophenotype / brain_data_standards_ontologies

A repository for co-ordinating work on ontologies for the Brain Data Standards Project
Apache License 2.0
10 stars 3 forks source link

Cross species naming issues #179

Closed shawntanzk closed 2 years ago

shawntanzk commented 2 years ago

Currently names in cross-species are very misleading, eg human sncg doesn't have sncg types under it etc.

solution ideas (need to be refined):

Cross-species grouping node naming: What the node represents: CGE/PoA Euarchontoglires cluster = Cluster of cells across species that most resemble mouse CGE PoA MOP => (Mouse CGE/PoA)-like GABAergic neuron of Euarchontoglires We really need better documentation of how this was done and how names were chosen than is available in the papers.

TODO: Invite Trygve to a call ASAP to discuss. We need to understand fully how the data is derived to settle this problem.

shawntanzk commented 2 years ago

For mouse - from https://www.nature.com/articles/s41586-021-03500-8

To facilitate the use of these cell types by investigators, we adopted a nomenclature that incorporates multiple anatomical and molecular identifiers. For example, we identified four clusters of excitatory neurons (expressing Slc17a7, which encodes the vesicular glutamate transporter VGLUT1) that express a deep layer marker, Fezf2, as well as Fam84b, which is a unique marker of the pyramidal tract3 or extratelencephalically- projecting neurons (ET) 16 (Fig. 1e). Thus, we labelled these neurons ‘L5 ET 1–4’. We divided GABAergic neurons into five major subclasses based on marker genes: Lamp5, Sncg and Vip, which label cells derived from the caudal ganglionic eminence, and Sst and Pvalb, which label cells derived from the medial ganglionic eminence. Finer distinctions among GABAergic types are identified by secondary markers (for example, Sst and Myh8). Tables of cluster accession IDs and differentially expressed genes between every pair of cell types help to track the cell types and their underlying molecular evidence17 (Supplementary Tables 3, 6).

For human and marmoset, it isn't super blatantly mentioned, but I think it's the same as MTG From https://www.nature.com/articles/s41586-021-03465-8 "For each species, we defined a diverse set of neuronal and non-neuronal clusters of cell types on the basis of unsupervised clustering of snRNA-seq datasets (Extended Data Fig. 1n–r and Supplementary Tables 1, 2). We organized cell types into hierarchical taxonomies on the basis of transcriptomic similarities (Fig. 1a–c, Extended Data Fig. 2 and Supplementary Table 3). As previously described for temporal cortex (middle temporal gyrus, MTG)3, taxonomies were broadly conserved across species, and neuronal subclasses reflected developmental origins and targets of long-range neuronal projections. Cell-type labels include the dissected layer (if available), major class, subclass marker gene and most-specific marker gene (Supplementary Tables 4–6). GABAergic (γ-aminobutyric acid-producing) types were uniformly rare (fewer than 4.5% of neurons), whereas glutamatergic and non-neuronal types were more variable in number (0.01–18.4% of neurons and 0.15–56.2% of non-neuronal cells, respectively). Finally, independent clustering of epigenomic data resulted in diverse clusters that were associated one-to-one with RNA clusters or at a slightly higher level in the hierarchy on the basis of shared marker expression."

From Jeremy re:MTG naming: "Cluster names were defined using an automated strategy which combined molecular information (marker genes) and anatomical information (layer of dissection). Clusters were assigned a broad class of interneuron, excitatory neuron, microglia, astrocyte, oligodendrocyte precursor, oligodendrocyte, or endothelial cell based on maximal median cluster CPM of GAD1, SLC17A7, TYROBP, AQP4, PDGFRA, OPALIN or NOSTRIN, respectively. Enriched layers were defined as the range of layers which contained at least 10% of the total cells from that cluster. Clusters were then assigned a broad marker, defined by maximal median CPM of PAX6, LAMP5, VIP, SST, PVALB, LINC00507, RORB, THEMIS, FEZF2, TYROBP, FGFR3, PDGFRA, OPALIN or NOSTRIN. Finally, clusters in all broad classes with more than one cluster (for example, interneuron, excitatory neuron, and astrocyte) were assigned a gene showing the most-specific expression in that cluster (see details below). We developed a principled nomenclature for clusters based on: (1) major cell class, (2) layer enrichment (including layers containing at least 10% of nuclei in that cluster), (3) a subclass marker gene (maximal expression of 14 manually-curated genes), and (4) a cluster-specific marker gene (maximal detection difference compared to all other clusters). For example, the inhibitory neuron type at the top of the plot in Fig. 1c, found in samples dissected from L1 and L2, and expressing the subclass marker PAX6 and the specific marker CDH12, is named Inh L1-2 PAX CDH12. A few cluster names were manually adjusted for clarity."

shawntanzk commented 2 years ago

Cross-species does not have a nice paragraph that talks about this, its quite extended. section "Consensus M1 taxonomy across species" of https://www.nature.com/articles/s41586-021-03465-8 has a whole bunch of information that explains why we are facing the difficulties we are facing with a whole bunch of stuff that arent as robust or can't be aligned well.

shawntanzk commented 2 years ago

Changed names accordingly - fixed