Open LLehner opened 4 months ago
Attention: Patch coverage is 33.33333%
with 24 lines
in your changes are missing coverage. Please review.
Project coverage is 69.75%. Comparing base (
df8e042
) to head (8ee07ba
).
hi @LLehner , thank you for this, would you mind elaborating a bit when this would be used? also, what if the embedding are pre-calculated, or the user would like to use something other than the UMAP, should that be an option? finally, I think a test would be required before we get this in, thanks!
Hey @giovp, this feature was coming out of a discussion with @maiiashulman. We ran into a situation in which the "literature-curated" signature for hypoxia was either 20 or 4000 genes, the latter obviously being useless. So we wondered which other genes maybe show the same spatially variable pattern as a function of distance to a certain cell-type (e.g. epithelial). This is essentially a graphical method to see if a given set of genes (f.e. the 20 gene signature) even varies in a similar pattern.
But I agree with your points; if we see that it's actually doing something useful, we should make it a bit more flexible.
Description
Adds a method in
tools
to calculate embeddings of variables by their counts aggregated by distance.Example usage
import squidpy as sq
load example data set
adata = sq.datasets.seqfish()
Calculate distances of each observation to a specified anchor point (e.g. cell type or tissue location). Here we use cell type "Endothelium" in the annotation column "celltype_mapped_refined":
sq.tl.var_by_distance(adata, groups="Endothelium", cluster_key="celltype_mapped_refined")
The resulting distances are stored in
adata.obsm["design_matrix"]
. Now we can calculate the embeddings:sq.tl.var_embeddings(adata, group="Endothelium", design_matrix_key="design_matrix")
Note that by default the bin of distance 0, meaning the counts that belong to the anchor point, are excluded. This can be changed by setting
include_anchor=True
insq.tl.var_embeddings()
.By default 100 bins are used. The resulting embeddings are stored in
adata.uns["100_bins_distance_embeddings"]
.We can plot the embedding (umap) as follows:
import matplotlib.pyplot as plt
embedding = adata.uns["100_bins_distance_embeddings"]
plt.scatter(embedding[0], embedding[1], c="grey")
plt.gca().set_aspect('equal', 'datalim')
plt.title('UMAP', fontsize=20)
which results in:![image](https://github.com/scverse/squidpy/assets/64135338/b0d4be6c-5640-4344-8c18-7d298610939f)
TODO