scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 600 forks source link

sc.tl.dendrogram doesn't use var_names #1549

Open Fougere87 opened 3 years ago

Fougere87 commented 3 years ago

I'm using the sc.pl.dendrogram multiple times different lists of genes on my dataset (incrementing number of highly variable genes basically). The outputted dendrogram is alway the same (I guess it's taking into account all the genes because it's using something like 32go of ram....)

Minimal code sample (that we can copy&paste without having any data)

hvegene_sets = [sc.pp.highly_variable_genes(adata, inplace=False, subset=False, n_top_genes=nhvg)["highly_variable"] for nhvg in [500,1000,2000, 3000,4000, 5000]]

then

[sum(hvgene) for hvgene in hvegene_sets]

outputs: [499, 1000, 1999, 2999, 4000, 4999] (so i have my different genesets)

then

dendro1 = sc.tl.dendrogram(adata,                   
                   var_names=adata.var_names[hvegene_sets[1]].values, 
                   optimal_ordering=True,
                   cor_method="spearman", linkage_method="complete", inplace=False,
                   groupby="Annotation")
dendro2 = sc.tl.dendrogram(adata,                   
                   var_names=adata.var_names[hvegene_sets[5]].values, 
                   optimal_ordering=True,
                   cor_method="spearman", linkage_method="complete", inplace=False,
                   groupby="Annotation")
[dendro1[key] ==dendro2[key] for key in dendro1.keys()] 

outputs:

[array([[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]]),
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True]])]

At first I was creating all dendrograms in a list comprehension and it did the same. I also directly inputted a list of my own and I obtained the same result.... I guess dendrogram don't detect the genes.

When running functions such as

## Testing with creating the dendro manually

def do_corr_mat(adata,  var_names, groupby, method = "spearman") :
    categories, obs_tidy = _prepare_dataframe(adata, var_names=var_names, groupby=groupby)
    mean_df = obs_tidy.groupby(level=0).mean()

    return mean_df.T.corr(method=method)

def do_dendro(corr_matrix, method="ward") :
    z_var = linkage(corr_matrix, method=linkage)
    return dendrogram(z_var, labels=mean_df.index)

Everything works fine !

Thanks by advance, C

Versions

1.6.0

SunYong0821 commented 2 years ago

I found the same problem in sc.pl.dotplot, but i found in \scanpy\plotting\_anndata.py 2236th line:

    if dendrogram_key not in adata.uns:
        from ..tools._dendrogram import dendrogram

        logg.warning(
            f"dendrogram data not found (using key={dendrogram_key}). "
            "Running `sc.tl.dendrogram` with default parameters. For fine "
            "tuning it is recommended to run `sc.tl.dendrogram` independently."
        )
        dendrogram(adata, groupby, key_added=dendrogram_key)

dendrogram is not add var_names, and i fixed it in my source code


anndata 0.7.8 scanpy 1.9.1

tanliwei-coder commented 1 year ago

I found that scanpy always only uses all var_names if the parameter var_names is set to not None.

image

TheBorgy commented 1 year ago

Any update on this? I encountered the same issue