obophenotype / cell-ontology

An ontology of cell types
https://obophenotype.github.io/cell-ontology/
Creative Commons Attribution 4.0 International
135 stars 49 forks source link

Addition of marker genes to 52 cell types matched from the Human Lung Cell Atlas #2313

Open scheuerm opened 3 months ago

scheuerm commented 3 months ago

I would like top propose the addition of marker gene expression assertions from analysis of the Human Lung Cell Atlas data to definitions of 52 existing CL terms with predicate skos:exactMatch. See attached speadsheet. exactMatch2CL_definitionAdditions.xlsx

dosumis commented 3 months ago

We can add, but I propose that we use the existing schema - developed for BDSO. (SKOS:exact_match would be confusing). This allows us to add confidence, provenance & context (species, tissue). This will also allow us to record multiple sources of markers for a cell type.

Very important to have provenance information on this as the annotation may not be uncontroversial.

dosumis commented 3 months ago

Documenting other sources of CL mappings on Sikemma integrated lung (normal) dataset https://cellxgene.cziscience.com/e/066943a2-fdac-4b29-b348-40cede398e4e.cxg/ for discussion purposes:

Here are the current Mappings on CxG (Note - table incomplete - to be updated)

index  cluster mapped_CL_term mapped_CL_term_id superclusters subclusters
1 "Goblet (nasal)" "nasal mucosa goblet cell" "CL_0002480" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] []
2 "Peribronchial fibroblasts" "bronchus fibroblast of lung" "CL_2000093" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] []
3 "SMG serous (nasal)" "serous secreting cell" "CL_0000313" ["SMG serous", "Submucosal Gland", "Epithelial"] []
4 "Adventitial fibroblasts" "alveolar type 2 fibroblast cell" "CL_4028006" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] []
5 "Alveolar fibroblasts" "alveolar type 1 fibroblast cell" "CL_4028004" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] []
6 "Goblet (subsegmental)" "tracheobronchial goblet cell" "CL_0019003" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] []
7 "SMG serous (bronchial)" "tracheobronchial serous cell" "CL_0019001" ["SMG serous", "Submucosal Gland", "Epithelial"] []
8 "Goblet (bronchial)" "bronchial goblet cell" "CL_1000312" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] []
9 "multi-ciliated epithelial cell" "multi-ciliated epithelial cell" "CL_0005012" ["Multiciliated lineage", "Airway epithelium", "Epithelial"] ["Multiciliated (nasal)", "Deuterosomal"]
10 "Alveolar macrophages" "alveolar macrophage" "CL_0000583" ["Macrophages", "Myeloid", "Immune"] ["Alveolar Mph CCL3+", "Alveolar macrophages", "Alveolar Mph proliferating", "Alveolar Mph MT-positive"]
11 "CD4 T cells" "CD4-positive, alpha-beta T cell" "CL_0000624" ["T cell lineage", "Lymphoid", "Immune", "None"] []
12 "SMG mucous" "mucus secreting cell" "CL_0000319" ["Submucosal Gland", "Epithelial", "None"] []
13 "AT2" "type II pneumocyte" "CL_0002063" ["Alveolar epithelium", "Epithelial", "None"] ["AT2", "AT2 proliferating"]
14 "Multiciliated (non-nasal)" "ciliated columnar cell of tracheobronchial tree" "CL_0002145" ["Multiciliated", "Multiciliated lineage", "Airway epithelium", "Epithelial"] []
15 "Migratory DCs" "dendritic cell" "CL_0000451" ["Dendritic cells", "None", "Myeloid", "Immune"] []
16 "Interstitial Mph perivascular" "lung macrophage" "CL_1001603" ["Interstitial macrophages", "Macrophages", "Myeloid", "Immune"] []
17 "Plasmacytoid DCs" "plasmacytoid dendritic cell" "CL_0000784" ["Dendritic cells", "None", "Myeloid", "Immune"] []
18 "Club" "club cell" "CL_0000158" ["Secretory", "Airway epithelium", "Epithelial"] ["Club (non-nasal)", "Club (nasal)"]
19 "SM activated stress response" "smooth muscle cell" "CL_0000192" ["Smooth muscle", "None", "Stroma"] []
20 "DC2" "CD1c-positive myeloid dendritic cell" "CL_0002399" ["Dendritic cells", "None", "Myeloid", "Immune"] []
21 "AT0" "epithelial cell of alveolus of lung" "CL_0010003" ["Transitional Club-AT2", "Secretory", "Airway epithelium", "Epithelial"] []
22 "Subpleural fibroblasts" "fibroblast" "CL_0000057" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] []
23 "Smooth muscle" "tracheobronchial smooth muscle cell" "CL_0019019" ["Smooth muscle", "None", "Stroma"] []
24 "pre-TB secretory" "epithelial cell of lower respiratory tract" "CL_0002632" ["Transitional Club-AT2", "Secretory", "Airway epithelium", "Epithelial"] []
25 "CD8 T cells" "CD8-positive, alpha-beta T cell" "CL_0000625" ["T cell lineage", "Lymphoid", "Immune", "None"] []
26 "AT1" "type I pneumocyte" "CL_0002062" ["None", "Alveolar epithelium", "Epithelial"] []
27 "T cells proliferating" "T cell" "CL_0000084" ["T cell lineage", "Lymphoid", "Immune", "None"] []
28 "DC1" "conventional dendritic cell" "CL_0000990" ["Dendritic cells", "None", "Myeloid", "Immune"] []
29 "Tuft" "brush cell of trachebronchial tree" "CL_0002075" ["Rare", "Airway epithelium", "Epithelial", "None"]

@aleixpuigb can you link to your latest mappings for Azimuth?

aleixpuigb commented 3 months ago

@aleixpuigb can you link to your latest mappings for Azimuth?

Here is the link.

scheuerm commented 3 months ago

@aleixpuigb I can do that as well

emquardokus commented 3 months ago

@scheuerm I’ll upload the new and domain expert approved mappings by this afternoon.

scheuerm commented 3 months ago

@aleixpuigb I can do that as well

Done

scheuerm commented 3 months ago

Documenting other sources of CL mappings on Sikemma integrated lung (normal) dataset https://cellxgene.cziscience.com/e/066943a2-fdac-4b29-b348-40cede398e4e.cxg/ for discussion purposes:

Here are the current Mappings on CxG (Note - table incomplete - to be updated)

index  cluster mapped_CL_term mapped_CL_term_id superclusters subclusters 1 "Goblet (nasal)" "nasal mucosa goblet cell" "CL_0002480" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] [] 2 "Peribronchial fibroblasts" "bronchus fibroblast of lung" "CL_2000093" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] [] 3 "SMG serous (nasal)" "serous secreting cell" "CL_0000313" ["SMG serous", "Submucosal Gland", "Epithelial"] [] 4 "Adventitial fibroblasts" "alveolar type 2 fibroblast cell" "CL_4028006" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] [] 5 "Alveolar fibroblasts" "alveolar type 1 fibroblast cell" "CL_4028004" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] [] 6 "Goblet (subsegmental)" "tracheobronchial goblet cell" "CL_0019003" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] [] 7 "SMG serous (bronchial)" "tracheobronchial serous cell" "CL_0019001" ["SMG serous", "Submucosal Gland", "Epithelial"] [] 8 "Goblet (bronchial)" "bronchial goblet cell" "CL_1000312" ["Goblet", "Secretory", "Airway epithelium", "Epithelial"] [] 9 "multi-ciliated epithelial cell" "multi-ciliated epithelial cell" "CL_0005012" ["Multiciliated lineage", "Airway epithelium", "Epithelial"] ["Multiciliated (nasal)", "Deuterosomal"] 10 "Alveolar macrophages" "alveolar macrophage" "CL_0000583" ["Macrophages", "Myeloid", "Immune"] ["Alveolar Mph CCL3+", "Alveolar macrophages", "Alveolar Mph proliferating", "Alveolar Mph MT-positive"] 11 "CD4 T cells" "CD4-positive, alpha-beta T cell" "CL_0000624" ["T cell lineage", "Lymphoid", "Immune", "None"] [] 12 "SMG mucous" "mucus secreting cell" "CL_0000319" ["Submucosal Gland", "Epithelial", "None"] [] 13 "AT2" "type II pneumocyte" "CL_0002063" ["Alveolar epithelium", "Epithelial", "None"] ["AT2", "AT2 proliferating"] 14 "Multiciliated (non-nasal)" "ciliated columnar cell of tracheobronchial tree" "CL_0002145" ["Multiciliated", "Multiciliated lineage", "Airway epithelium", "Epithelial"] [] 15 "Migratory DCs" "dendritic cell" "CL_0000451" ["Dendritic cells", "None", "Myeloid", "Immune"] [] 16 "Interstitial Mph perivascular" "lung macrophage" "CL_1001603" ["Interstitial macrophages", "Macrophages", "Myeloid", "Immune"] [] 17 "Plasmacytoid DCs" "plasmacytoid dendritic cell" "CL_0000784" ["Dendritic cells", "None", "Myeloid", "Immune"] [] 18 "Club" "club cell" "CL_0000158" ["Secretory", "Airway epithelium", "Epithelial"] ["Club (non-nasal)", "Club (nasal)"] 19 "SM activated stress response" "smooth muscle cell" "CL_0000192" ["Smooth muscle", "None", "Stroma"] [] 20 "DC2" "CD1c-positive myeloid dendritic cell" "CL_0002399" ["Dendritic cells", "None", "Myeloid", "Immune"] [] 21 "AT0" "epithelial cell of alveolus of lung" "CL_0010003" ["Transitional Club-AT2", "Secretory", "Airway epithelium", "Epithelial"] [] 22 "Subpleural fibroblasts" "fibroblast" "CL_0000057" ["Fibroblasts", "Fibroblast lineage", "Stroma", "None"] [] 23 "Smooth muscle" "tracheobronchial smooth muscle cell" "CL_0019019" ["Smooth muscle", "None", "Stroma"] [] 24 "pre-TB secretory" "epithelial cell of lower respiratory tract" "CL_0002632" ["Transitional Club-AT2", "Secretory", "Airway epithelium", "Epithelial"] [] 25 "CD8 T cells" "CD8-positive, alpha-beta T cell" "CL_0000625" ["T cell lineage", "Lymphoid", "Immune", "None"] [] 26 "AT1" "type I pneumocyte" "CL_0002062" ["None", "Alveolar epithelium", "Epithelial"] [] 27 "T cells proliferating" "T cell" "CL_0000084" ["T cell lineage", "Lymphoid", "Immune", "None"] [] 28 "DC1" "conventional dendritic cell" "CL_0000990" ["Dendritic cells", "None", "Myeloid", "Immune"] [] 29 "Tuft" "brush cell of trachebronchial tree" "CL_0002075" ["Rare", "Airway epithelium", "Epithelial", "None"] @aleixpuigb can you link to your latest mappings for Azimuth?

These mappings don't explicitly distinguish between exact matches and inexact matches

dosumis commented 3 months ago

True. CxG doesn't support that. I'd like to understand better what you mean by ExactMatch. e.g. these are in exactMatch2CL_definitionAdditions.xlsx

clusterName (subject) (predicate) My CL manual match (object) PURL
Secretory skos:exactMatch secretory cell http://purl.obolibrary.org/obo/CL_0000151
Multiciliated skos:exactMatch multi-ciliated epithelial cell http://purl.obolibrary.org/obo/CL_0005012

'secretory cell' is a super abstract (and rather suspect) grouping class in CL. The cell set being annotated represents(something like) secretory cells of the lung airway epithelium. Here's the context in the annotation hierarchy:

image

An exact match for this would require a new CL term.

"multi-ciliated epithelial cell" is a general cell class covering multiciliated cells in many locations. Given the context, a better exactMatch might be "ciliated columnar cell of tracheobronchial tree"

scheuerm commented 3 months ago

@dosumis the exactMatch was simply a match of the label strings and consistency with the CL definitions. But I agree that the "Secretory" label does not match the cluster hierarchy very well. There is a grouping of clusters that does include the Goblets and Clubs (and SMG duct, which may be a problematic label), but excludes the AT0 and pre-TB secretory clusters.

dendrogram_full expression matrix_Renee.pdf

In terms of multi-ciliated, I did see that more specific term, but since one of the subtypes was "multiciliated (nasal), I thought that tracheobronchial was not correct.

emquardokus commented 3 months ago

@dosumis @scheuerm FYI on new terms added via HuBMAP for this same study with Gloria Pryhuber discussing directly with the another lung researcher: Martijn Nawijn; James (Jim) Hagood for nasal passage work. See Validation Report Changes for discussions: https://hubmapconsortium.github.io/ccf-validation-tools/Lung/comments/
These ciliated and secretory types of cells has been in discussion with lung community. The nasal cavity cells are not yet in CL for any of the types-- there is a submucosal gland in nasal cavity just like in lung, these were all on my new term request list I've been working with the lung experts to get proper descriptions for and nomenclature the community prefers. Short hand is always used in these papers, unfortunately.
I have not yet uploaded the final set we are using because we were discussing a few additional cell types. I should have this in another day.

scheuerm commented 3 months ago

@emquardokus That sounds good. My main focus is on definitional marker gene combinations. I would defer to the SMEs for cell type names/labels. But one additional area that we are starting to explore is how well the CL hierarchy matches the data-driven cluster dendrogram (see attachment to my previous comment). I think we will learn some better biology through that comparison.

dosumis commented 3 months ago

Proposal:

  1. We will add for specific cell types
  2. Details/discussion design pattern addition here: https://github.com/obophenotype/cell-ontology/issues/2136#issuecomment-2043018555 - we will aim to have this added quickly and with this we can add markers via a spreadsheet.
  3. In the meantime, we would be happy to add as text as long as we can include confidence and a DOI with information about how the marker set was determined. @scheuerm - Can you work with Renne on a brief description of how these were calculate and with what source data - to go on Zenodo. This will give us a DOI for referencing. Could you also add confidence scores to your spreadsheet. Thanks!