ismms-himc / clustergrammer2

"Dimensionality-increasing" data visualization tool and interactive WebGL Jupyter widget built for single-cell data.
https://clustergrammer.readthedocs.io/case_studies.html
MIT License
116 stars 20 forks source link

Bug: Use of distance metrics when clustering #111

Open ltsypin opened 2 years ago

ltsypin commented 2 years ago

I was digging a bit in the code to understand how the distance metric was implemented during clustering, and I think I found a bug:

In clustergrammer2/clustergrammer_fun/calc_clust.py, lines 80-93, it looks like neither the scipy nor fastclust libraries make use of the dist_type parameter. Right now, the code is:

def clust_and_group(net, inst_dm, axis, mat, dist_type='cosine', linkage_type='average',
                    clust_library='scipy', min_samples=1, min_cluster_size=2):

  # print(clust_library)

  import scipy.cluster.hierarchy as hier
  import pandas as pd

  if clust_library == 'scipy':
    Y = hier.linkage(inst_dm, method=linkage_type)

  elif clust_library == 'fastcluster':
    import fastcluster
    Y = fastcluster.linkage(inst_dm, method=linkage_type)

Shouldn't it be Y = hier.linkage(inst_dm, method=linkage_type, metric=dist_type) for the scipy case, for example?

Thanks!

cornhundred commented 2 years ago

Hi @ltsypin, thanks for pointing this out. We are actually not using this dist_type variable in the her.linkage method since we are passing in a pre-calculated distance matrix, but we should not be passing in this variable. We can update this in the next release. Feel free to make a pull request or let me know if you have any other concerns.