Open stemangiola opened 2 years ago
I start proposing a small number of transcriptomic markers, please if you can extend this list.
@ConnieLWS could you please add you gene list here?
This is the current gene list but it's still being refined:
Tcell.sig <- c("CD3G", "CD4","CAMK4", "CD2", "CD3D", "CD3E")
Bcell.sig <- c("CD79A", "BANK1", "BLK", "CD19", "CD22", "CD79B", "CPNE5", "FCRL1")
Monocyte.sig <- c("CD68", "CD14", "S100A9", "NKG7")
DC.sig <- c("FCER1A", "CLEC4C", "CIITA", "BCL11A")
NK.sig <- c("GNLY", "KLRF1", "NKG7", "KLRD1", "PRF1")
FYI @goknurginer
Do you want tissue-specific marker genes for immune cells? If so, which tissue types would you like to focus on first?
Do you want tissue-specific marker genes for immune cells? If so, which tissue types would you like to focus on first?
No just a very small list of generic markers that would cluster integrated 11M cells of all tissues. after we divide cells into major macro clusters, we will integrate them separately using all genes.
With our small gene signature, we should "validate" it on the high-confidence cell types, for example using boxplots for the scaled gene-transcript abundance.
For obtaining the high-confidence cells, you can do
metadata |> filter(confidence_class==1)
In the meanwhile @multimeric add couple of features we need, let's start with MNN (scater) integration method using 10-50 genes, and start with 100K cells (we have 11M immune cells in total).
@ConnieLWS @multimeric FYI
"A unified analysis of atlas single cell data"
https://www.biorxiv.org/content/10.1101/2022.08.06.503038v1.full
Here are some I think I'll try to benchmark, based on Connie's literature review:
Great,
You don't think we have scope for 2 Python tools?
You don't think we have scope for 2 Python tools?
Potentially, but the goal at this stage is to get the "minimum viable product", so we have to be careful of using our time parsimoniously. If you find yourself waiting for computation (we should avoid this testing on small chunks of data) you can work on your figure for the paper (in the todo list)
Currently I have no data set to test these tools on anyway.
Currently I have no data set to test these tools on anyway.
You can first implement the tool with dummy data (the dataset queries in the README file). This initial dataset selection should not be a bottleneck.
Tested initial classification using 27 marker genes. The gene signature is still being refined.
Tcell.sig <- c("CD3G", "CD4","CAMK4", "CD2", "CD3D", "CD3E")
Bcell.sig <- c("CD79A", "BANK1", "BLK", "CD19", "CD22", "CD79B", "CPNE5", "FCRL1")
Monocyte.sig <- c("CD68", "CD14", "S100A9", "NKG7")
DC.sig <- c("FCER1A", "CLEC4C", "CIITA", "BCL11A")
NK.sig <- c("GNLY", "KLRF1", "NKG7", "KLRD1", "PRF1")
Initial testing was performed on 2 samples (~10k cells each) from one dataset:
1) divide cells based on mcroclusters (e.g. B cells, CD8 T, monocytes). This is not always trivial, we have some high-confidence annotation, but some cells cannot be easily classified in T, B, Monocytes.
file_id
and.sample
, omitting the legends to save space in the plot).file_id
and.sample
, omitting the legends to save space in the plot), without integration.