lhqing / whole_mouse_brain

misc code for whole mouse brain analysis
MIT License
5 stars 0 forks source link

Figure 2. The DNA methylome contains extensive brain cellular diversity that integrable with other modalities. #26

Open lhqing opened 2 years ago

lhqing commented 2 years ago

Figure files

https://drive.google.com/drive/u/1/folders/17ukhBF1y_acxglZtf8wsH4Hfw329q8-9

Panel legends

Supplementary files

Methods

Reproducibility

Code to generate the figure

Processed data

Internal API

Input dataset and API

Output dataset and API

lhqing commented 2 years ago

Some observations about integration based on the current algorithm (ALLCools v1.0.14)

  1. Include more genes in the beginning, e.g., use all CEF from the last round, use CEF from both ref and query data
  2. Before scale, filter features by a minimum std (e.g., > 0.005) in both datasets. Don't include features with little variance
  3. Features need to be scaled and mean-centered before decomposition.
  4. n_pc and n_cc matter, especially when the cell cluster diversity is low in the dataset, including too many PC and CC negatively impacting integration results. (There is a scale by singular value step, so variance from small components got overestimated). How to determine n_pc and n_cc?
  5. Which one is the reference? Which one is the query? Based on the analysis goal or based on which dataset is more diverse?
  6. Do harmony to finalize the integration helps to improve the results in some low-diversity cases.
lhqing commented 2 years ago

mC - AIBS 10X Integration

gs://ecker-hanqing-analysis/221015-cemba-mc-aibs-tenx-integration

Integration strategy

  1. Methylation-only clustering: We did iterative clustering and defined 4,673 cell clusters in our 301,626 methylome cells.
  2. Cell-type annotation: Starting from the entire dataset, we did iterative integration and manual evaluation (dissection region, gene, embedding, etc.) to annotate our 4673 mC clusters into 261 cell types using the same name from the AIBS-10X dataset.
  3. Cell-cluster annotation: Within each cell type and potential overlaps from (2), we redo the iterative integration to match 5208 RNA clusters with 4673 snmC clusters.

Cell Type Match

261 / 306 RNA cell types or 3988980 / 4065284 (98.1%) RNA cells found corresponding mC cell types. See the "Cell Type Label" column in the google sheet. Cell type annotation is one-to-one, with some potential overlapping noted in the next column.

Cell Cluster Match

4669 / 5208 RNA clusters or 3958398 / 4065284 (97.4%) RNA cells found corresponding mC cell clusters. There is a total of 2240 unique matching patterns. See the "Matched AIBS 10X RNA Clusters" column in the google sheet.

Example

lhqing commented 2 years ago

mC - ATAC Integration

Integration strategy

  1. Methylation-only clustering: same as RNA integration
  2. Iterative integration within each major region: After separating NN (integrated separately), we group all the neuronal cells into ten major regions and perform iterative integration within each major region.
  3. Within each round of integration, ATAC cells are assigned to mC clusters via integration co-clusters. We found this way the dissection region distribution between mC and ATAC in the next round of integration is most correlated.

ATAC Cell to mC Cluster Match

2065820 / 2312406 (89.3%) ATAC cells assigned to at least one mC cluster.

ATAC Cluster to mC Cluster relationship

We also calculated the ATAC 600-cluster proportion of ATAC cells assigned to each mC cluster. See the "Matched CEMBA ATAC Clusters" column in the google sheet.

Example

lhqing commented 2 years ago

mC - m3C Integration

gs://ecker-hanqing-analysis/221022-cemba-mc-cemba-m3c-integration

  1. Methylation-only clustering: same as RNA integration. For m3C, we also did the same iterative clustering separately.
  2. m3C cell type Annotation: Starting from the entire dataset, we performed the iterative integration to annotate m3C cell types via the mC cell types and assign m3C cell clusters to mC cell clusters.

All m3C cell clusters are assigned to at least one mC cell cluster. All the mC cell types, except CB Purkinje Cell (c21 in mC L1 Clustering), has matched m3C clusters.

We found not doing harmony for mC-m3C receive better results.

lhqing commented 1 year ago

snmC - AIBS TENX Integration

Notes

AIBS v2 annot and v3 annot v3 L2annot 306 labels, not including "LQ" ![2871665818045 pic](https://user-images.githubusercontent.com/29302823/195974422-8dc17d1e-819c-4cc4-863f-7286c2f37e06.jpg)