Closed selifeski closed 4 years ago
the clusters are simply annotations added in the adata.obs
pandas dataframe. Thus, to merge the clusters you can create a new column containing your merged clusters. For example:
old_to_new = dict(
old_cluster1='new_cluster1',
old_cluster2='new_cluster1',
old_cluster3='new_cluster2',
)
adata.obs['new_clusters'] = (
adata.obs['old_clusters']
.map(old_to_new)
.astype('category')
)
For general help like this, please go to https://scanpy.discourse.group/
This is also what the issue template says. How could we have made the text more clear so that you’d have found your way there?
@fidelram, I like that! I had been struggling to come up with a concise way of doing this. I wonder if we can make that more concise. Here's one where the mapping can be defined inline, and you don't have to define relationships for the ones that stay the same:
adata.obs['new_clusters'] = (
adata.obs["old_clusters"]
.map(lambda x: {"a": "b"}.get(x, x))
.astype("category")
)
@ivirshup I like that.
Here's a related question, if I want to make a labelling which includes a subset of clusters from a few different solutions, is there a concise way to write that? I.e. I want clusters 1,2, and 3 from clustering A, and clusters 4 and 5 from clustering B.
I think you will need two steps, one to get clusters 1,2, and 3 from clustering A and other for the rest
Can this method apply to Leiden clustering as well? I recapitulated the above code in my program, and my new cluster column returned only NaNs. What should the "old_cluster1" side of the structure look like when I am trying to make that dictionary?
Thanks
Can this method apply to Leiden clustering as well? I recapitulated the above code in my program, and my new cluster column returned only NaNs. What should the "old_cluster1" side of the structure look like when I am trying to make that dictionary?
Thanks
I have the same issue...
Just to answer those that, like me, are beginners in python, the solution provided by @ivirshup works perfectly (of course for louvain
and leiden,
and any other adata.obs
that you want to remap):
adata.obs['new_clusters'] = (
adata.obs["old_clusters"]
.map(lambda x: {"a": "b"}.get(x, x))
.astype("category")
)
Where "a" is the name of the category you want to change, and "b" is the new name of the category that you want to change. If you have more categories you want to change simply add more entries to the dictionary like:
adata.obs['new_clusters'] = (
adata.obs["old_clusters"]
.map(lambda x: {"a": "b", "c": "d"}.get(x, x))
.astype("category")
)
@fidelram answer does not work in this specific case because the adata.obs
from the louvain (or leiden) algorithm are categories named 0, 1, 2, 3, 4 and you cannot construct a dictionary using '0':'X' because SyntaxError: keyword can't be an expression
.
Hope this helps,
Best,
A
Hi guys,
Thank you for sharing your code and explanation. What if I want to rename multiple clusters ["a","c","d"] to "b" ? I have tried a list of elements to change as a key, but it does not work for me.
Thanks in advance for your reply
the below worked for me, I think the Python dict formating has changed. Notice I am also merging clusters by assigning them the same name
old_to_new = { 0:'Astrocytes 1', 1:'Glutamatergic neurons 1', 2:'Astrocytes 2', 3:'Oligodendrocytes 1', 4:'Inhibitory neurons 1', 5:'Glutamatergic neurons 2', 6:'Oligodendrocytes 1', 7:'Unknown', 8:'OPCs', 9:'Glutamatergic neurons 3', 10:'Microglia', 11:'Inhibitory neurons 1', 12:'Tanycytes', 13:'Endothelial', 14:'Astrocytes 3', 15:'Oligodendrocytes 1', 16:'Inhibitory neurons 2', 17:'T cells', 18:'Oligodendrocytes 2', } adata.obs['annotation'] = ( adata.obs['seurat_clusters'] .map(old_to_new) .astype('category') )
I think anndata’s rename_categories
should accept non-unique values as argument. Then one could simply do things like
cluster_markers = {
'CD4 T': {'IL7R'},
'CD14+\nMonocytes': {'CD14', 'LYZ'},
'B': {'MS4A1'},
'CD8 T': {'CD8A'},
'NK': {'GNLY', 'NKG7'},
'FCGR3A+\nMonocytes': {'FCGR3A', 'MS4A7'},
'Dendritic': {'FCER1A', 'CST3'},
'Mega-\nkaryocytes': {'PPBP'},
}
marker_matches = sc.tl.marker_gene_overlap(adata, cluster_markers)
adata.rename_categories('leiden', marker_matches.idxmax())
As it stands, things like the pbmc3k
tutorial are super flaky because they hardcode things like this.
Cool use of .idxmax()
here, @flying-sheep! I would still inspect manually though ;).
Hi guys,
Thank you for sharing your code and explanation. What if I want to rename multiple clusters ["a","c","d"] to "b" ? I have tried a list of elements to change as a key, but it does not work for me.
Thanks in advance for your reply
I have the same question, anyone have the solution? Please let me know. Thank you.
the below worked for me, I think the Python dict formating has changed. Notice I am also merging clusters by assigning them the same name
old_to_new = { 0:'Astrocytes 1', 1:'Glutamatergic neurons 1', 2:'Astrocytes 2', 3:'Oligodendrocytes 1', 4:'Inhibitory neurons 1', 5:'Glutamatergic neurons 2', 6:'Oligodendrocytes 1', 7:'Unknown', 8:'OPCs', 9:'Glutamatergic neurons 3', 10:'Microglia', 11:'Inhibitory neurons 1', 12:'Tanycytes', 13:'Endothelial', 14:'Astrocytes 3', 15:'Oligodendrocytes 1', 16:'Inhibitory neurons 2', 17:'T cells', 18:'Oligodendrocytes 2', } adata.obs['annotation'] = ( adata.obs['seurat_clusters'] .map(old_to_new) .astype('category') )
For me, adding quotation marks to the cluster ID did the trick. I then just did
adata.obs["celltype"] = adata.obs.leiden.map(old_to_new)
like shown here.
Hi,
To have a depth understanding, I wanted to set the resolution high for louvain clustering, but now I cannot merge subclusters. When I try to rename the categories with same cluster name, it gives an error about not having unique names. Yet, I could not find a functional merge_clusters function. Is there anyone having the same issue as me? I would appreciate any help. Thanks!